Java: Micro optimized array operation

I am trying to make a simple feedforward neural network Java port

My current code is as follows (delete error handling and initialization):

/**
 * Simple implementation of a Feedforward neural network. The network supports
 * including a bias neuron with a constant output of 1.0 and weighted synapses
 * to hidden and output layers.
 * 
 * @author Martin Wiboe
 */
public class FeedForwardNetwork {
private final int outputNeurons;    // No of neurons in output layer
private final int inputNeurons;     // No of neurons in input layer
private int largestLayerNeurons;    // No of neurons in largest layer
private final int numberLayers;     // No of layers
private final int[] neuronCounts;   // Neuron count in each layer,0 is input
                                // layer.
private final float[][][] fWeights; // Weights between neurons.
                                    // fWeight[fromLayer][fromNeuron][toNeuron]
                                    // is the weight from fromNeuron in
                                    // fromLayer to toNeuron in layer
                                    // fromLayer+1.
private float[][] neuronOutput;     // Temporary storage of output from prevIoUs layer


public float[] compute(float[] input) {
    // Copy input values to input layer output
    for (int i = 0; i < inputNeurons; i++) {
        neuronOutput[0][i] = input[i];
    }

    // Loop through layers
    for (int layer = 1; layer < numberLayers; layer++) {

        // Loop over neurons in the layer and determine weighted input sum
        for (int neuron = 0; neuron < neuronCounts[layer]; neuron++) {
            // Bias neuron is the last neuron in the prevIoUs layer
            int biasNeuron = neuronCounts[layer - 1];

            // Get weighted input from bias neuron - output is always 1.0
            float activation = 1.0F * fWeights[layer - 1][biasNeuron][neuron];

            // Get weighted inputs from rest of neurons in prevIoUs layer
            for (int inputNeuron = 0; inputNeuron < biasNeuron; inputNeuron++) {
                activation += neuronOutput[layer-1][inputNeuron] * fWeights[layer - 1][inputNeuron][neuron];
            }

            // Store neuron output for next round of computation
            neuronOutput[layer][neuron] = sigmoid(activation);
        }
    }

    // Return output from network = output from last layer
    float[] result = new float[outputNeurons];
    for (int i = 0; i < outputNeurons; i++)
        result[i] = neuronOutput[numberLayers - 1][i];

    return result;
}

private final static float sigmoid(final float input) {
    return (float) (1.0F / (1.0F + Math.exp(-1.0F * input)));
}
}

I use the - server option to run the JVM. So far, my code is 25% to 50% slower than similar C code What can I do to improve this situation?

thank you,

Martin Webb

Editor #1: after seeing a lot of responses, I should clarify the figures in our scenario During a typical run, the method will be called about 50.000 times with different inputs A typical network will have digital layer = 3 layers, 190, 2 and 1 neuron respectively Therefore, the innermost loop will have about 2 * 191 3 = 385 iterations (when calculating the bias neurons added in layers 0 and 1)

Edit #1: after implementing various suggestions in this thread, our implementation is as fast as C version (within ~ 2%) Thank you for all your help! All suggestions are helpful, but since I can only mark one answer as the correct answer, I will provide it to @ Durandal to prompt array optimization, and it is the only one that can pre calculate the value of the for loop header

Solution

Regardless of the actual mathematics, the array index in Java itself can be a performance indicator Considering that Java does not have real multidimensional arrays, but implements them as arrays In the innermost loop, you can access multiple indexes, some of which are actually unchanged in the loop Part of the array access can be moved out of the loop:

final int[] neuronOutputSlice = neuronOutput[layer - 1];
final int[][] fWeightSlice = fWeights[layer - 1];
for (int inputNeuron = 0; inputNeuron < biasNeuron; inputNeuron++) {
    activation += neuronOutputSlice[inputNeuron] * fWeightsSlice[inputNeuron][neuron];
}

The server JIT may perform similar code invariant motion, and the only way is to change and configure it On the client side, JIT should improve performance no matter what Another thing you can try is to pre calculate the loop exit conditions, as follows:

for (int neuron = 0; neuron < neuronCounts[layer]; neuron++) { ... }
// transform to precalculated exit condition (move invariant array access outside loop)
for (int neuron = 0,neuronCount = neuronCounts[layer]; neuron < neuronCount; neuron++) { ... }

Again, JIT may have done this for you, so if it helps

Is there anything I can multiply by 1.0F that I can't understand?

float activation = 1.0F * fWeights[layer - 1][biasNeuron][neuron];

Other things that may increase speed at the expense of readability: manually inlining sigmoid () function (JIT has very strict restrictions on inlining, and the function may be larger) Running a loop (without changing the result of course) can be a little faster, because it is cheaper to check a local variable by comparing the loop index with a zero test (the innermost loop is a powerful candidate, but don't expect the output to be 100% the same in all cases, because adding floating-point ABC may be different from ACB)

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>