70 lines of Java code to realize deep neural network algorithm sharing

For the popular in-depth learning, it is necessary to maintain the learning spirit - programmers, especially architects, should always pay attention to and be sensitive to the core technologies and key algorithms. When necessary, they should write and master them without worrying about when to use them - whether to use them or not is a political issue, and whether they will write them is a technical issue, just like soldiers do not care about whether to fight or not, We should be concerned about how to win.

How programmers learn machine learning

For programmers, There is a certain threshold for machine learning (this threshold is also its core competitiveness). I believe that many people will have a headache for English papers full of mathematical formulas when learning machine learning, and may even retreat from difficulties. However, in fact, it is not difficult to write the landing program of machine learning algorithm. The following is the reverse multilayer implementation of 70 lines of code (BP) neural network algorithm, that is, deep learning. In fact, it is not only neural network, but also most machine learning algorithms such as logical regression, decision tree C45 / ID3, random forest, Bayesian, collaborative filtering, graph calculation, kmeans, PageRank, etc. can be implemented in 100 lines of stand-alone program (to be shared later).

The real difficulty of machine learning lies in why it calculates like this, what is the mathematical principle behind it and how to deduce the formula. Most of the materials on the Internet introduce this part of theoretical knowledge, but rarely tell you the calculation process of the algorithm and how the program is implemented. For programmers, all you need to do is engineering application, There is no need to prove a new mathematical calculation method. In fact, most machine learning engineers use open source packages or tool software written by others to input data and adjust calculation coefficients to train results, and even rarely implement the algorithm process by themselves. However, it is still very important to master the calculation process of each algorithm, so that you can understand what changes the algorithm makes to the data and what effect the algorithm aims to achieve.

This paper focuses on the single machine implementation of reverse neural network. As for the multi machine parallelization of neural network, Fourier provides a very flexible and perfect parallel computing framework. We only need to understand the single machine program implementation, we can conceive and design the distributed parallelization scheme. If we do not understand the algorithm calculation process, all ideas will not be carried out. In addition, there is convolutional neural network, which is mainly a dimension reduction idea for image processing, which is not discussed in this paper.

Neural Network Process Description:

First of all, it should be clear that neural networks do prediction tasks. I believe you remember the least squares method learned in high school. We can make a less rigorous but intuitive analogy:

first, We want to get a data set and the mark of the data set (in the least square method, we also get a set of values of X and y). The algorithm fits a function parameter that can express the data set according to the data set and the corresponding mark (that is, the formula for calculating a and B in the least square method, but this formula in the neural network can not be obtained directly) we get the fitting function (that is, the fitting straight line y ^ = ax + B in the least square method). Next, after bringing in new data, we can generate the corresponding prediction value y^ (in the least square method, the neural network algorithm is also, but the function obtained is much more complex than the least square method).

Calculation process of neural network

The neural network structure is shown in the figure below. The leftmost is the input layer, the rightmost is the output layer, and there are multiple hidden layers in the middle. Each neural node of the hidden layer and the output layer is obtained by multiplying the node of the previous layer by its weight. The circle marked with "+ 1" is the intercept item b. for each node outside the input layer: y = W0 * x0 + W1 * X1 +... + wn * xn + B, Therefore, we can know that the neural network is equivalent to a multi-layer logical regression structure.

Algorithm calculation process: start from the input layer, calculate from left to right, move forward layer by layer until the output layer produces results. If there is a gap between the result value and the target value, calculate from right to left, calculate the error of each node layer by layer, and adjust all weights of each node. After reaching the input layer in reverse, calculate again forward, and repeat the above steps until all weight parameters converge to a reasonable value. Because the computer program is different from the mathematical method in solving equation parameters, generally, the parameters are selected randomly, and then the parameters are continuously adjusted to reduce the error until approaching the correct value. Therefore, most machine learning is continuous iterative training. Let's take a detailed look at the implementation of the process from the program below.

Algorithm and program implementation of neural network

The algorithm program of neural network is divided into three processes: initialization, forward calculation results and reverse modification of weights.

1. Initialization process

Because it is an n-layer neural network, we use a two-dimensional array layer to record the node value. The first dimension is the number of layers, the second dimension is the node position of the layer, and the value of the array is the node value; Similarly, the node error value layererr is recorded in a similar manner. Using 3D array layer_ Weight records the weight of each node. The first dimension is the number of layers, the second dimension is the node position of the layer, and the third dimension is the node position of the lower layer. The value of the array is the weight value of a node reaching a node at the lower layer, and the initial value is a random number between 0-1. In order to optimize the convergence speed, the momentum method is used to adjust the weight value. The last weight adjustment amount needs to be recorded, and the three-dimensional array layer is used_ weight_ Delta to record, intercept item processing: set the intercept value to 1 in the program, so you only need to calculate its weight,

2. Forward calculation results

S function 1 / (1 + math.exp (- z)) is used to unify the value of each node to 0-1, and then calculate forward layer by layer until the output layer. In fact, s function is not needed for the output layer. Here, we regard the output result as the probability value between 0 and 1, so s function is also used, which is also conducive to the unity of program implementation.

3. Reverse modify weight

The square error function E is generally used to calculate the error by neural network, as follows:

That is, the square of the error between multiple output items and the corresponding target value is accumulated and divided by 2. In fact, the error function of logistic regression is also this. As for why we use this function to calculate the error, what is its mathematical rationality and how to get it, I suggest programmers not to study it deeply if they don't want to be mathematicians. Now what we need to do is how to minimize the e error of this function, and we need to derive it, If there is some mathematical basis for derivation, you can try to deduce how to derive the following formula from the weight derivation of function E:

It doesn't matter if we don't deduce. We just need to use the result formula. In our program, we use layerr to record the minimization error of E after deriving the weight, and then adjust the weight according to the minimization error.

Note that the momentum method is used here to adjust, and the experience of the last adjustment is taken into account to avoid falling into the local minimum. The following K represents the number of iterations, MOBP is the momentum term and rate is the learning step size:

Δ w(k+1) = mobP* Δ w(k)+rate*Err*Layer

Many use the following formula, and the effect difference is not too great:

Δ w(k+1) = mobP* Δ w(k)+(1-mobp)rate*Err*Layer

In order to improve the performance, note that the program implementation calculates the error and adjusts the weight simultaneously in a while, First locate the position on the penultimate layer (i.e. the last hidden layer), then adjust it reversely layer by layer, adjust the weight of layer L according to the calculated error of layer L + 1, and calculate the error of layer L, which is used to calculate the weight when the next cycle to layer L-1, so as to cycle until the end of the penultimate layer (input layer).

Summary

In the whole calculation process, the value of the node changes every calculation and does not need to be saved, while the weight parameters and error parameters need to be saved and need to provide support for the next iteration. Therefore, if we conceive a distributed multi machine parallel computing scheme, we can understand why there is a concept of parameter server in other frameworks.

Complete program implementation of multilayer neural network

The following implementation program bpdeep Java can be used directly, and it can be easily modified to be implemented in any other language such as C, c#, python, etc., because they are all basic statements and do not use other Java libraries (except random functions).

import java.util.Random;
public class BpDeep{
  public double[][] layer;//神经网络各层节点
  public double[][] layerErr;//神经网络各节点误差
  public double[][][] layer_weight;//各层节点权重
  public double[][][] layer_weight_delta;//各层节点权重动量
  public double mobp;//动量系数
  public double rate;//学习系数

  public BpDeep(int[] layernum,double rate,double mobp){
    this.mobp = mobp;
    this.rate = rate;
    layer = new double[layernum.length][];
    layerErr = new double[layernum.length][];
    layer_weight = new double[layernum.length][][];
    layer_weight_delta = new double[layernum.length][][];
    Random random = new Random();
    for(int l=0;l<layernum.length;L++){
      layer[l]=new double[layernum[l]];
      layerErr[l]=new double[layernum[l]];
      if(l+1<layernum.length){
        layer_weight[l]=new double[layernum[l]+1][layernum[l+1]];
        layer_weight_delta[l]=new double[layernum[l]+1][layernum[l+1]];
        for(int j=0;j<layernum[l]+1;j++)
          for(int i=0;i<layernum[l+1];i++)
            layer_weight[l][j][i]=random.nextDouble();//随机初始化权重
      }  
    }
  }
  //逐层向前计算输出
  public double[] computeOut(double[] in){
    for(int l=1;l<layer.length;L++){
      for(int j=0;j<layer[l].length;j++){
        double z=layer_weight[l-1][layer[l-1].length][j];
        for(int i=0;i<layer[l-1].length;i++){
          layer[l-1][i]=l==1?in[i]:layer[l-1][i];
          z+=layer_weight[l-1][i][j]*layer[l-1][i];
        }
        layer[l][j]=1/(1+Math.exp(-z));
      }
    }
    return layer[layer.length-1];
  }
  //逐层反向计算误差并修改权重
  public void updateWeight(double[] tar){
    int l=layer.length-1;
    for(int j=0;j<layerErr[l].length;j++)
      layerErr[l][j]=layer[l][j]*(1-layer[l][j])*(tar[j]-layer[l][j]);

    while(l-->0){
      for(int j=0;j<layerErr[l].length;j++){
        double z = 0.0;
        for(int i=0;i<layerErr[l+1].length;i++){
          z=z+l>0?layerErr[l+1][i]*layer_weight[l][j][i]:0;
          layer_weight_delta[l][j][i]= mobP*layer_weight_delta[l][j][i]+rate*layerErr[l+1][i]*layer[l][j];//隐含层动量调整
          layer_weight[l][j][i]+=layer_weight_delta[l][j][i];//隐含层权重调整
          if(j==layerErr[l].length-1){
            layer_weight_delta[l][j+1][i]= mobP*layer_weight_delta[l][j+1][i]+rate*layerErr[l+1][i];//截距动量调整
            layer_weight[l][j+1][i]+=layer_weight_delta[l][j+1][i];//截距权重调整
          }
        }
        layerErr[l][j]=z*layer[l][j]*(1-layer[l][j]);//记录误差
      }
    }
  }

  public void train(double[] in,double[] tar){
    double[] out = computeOut(in);
    updateWeight(tar);
  }
}
The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>