Memory leak using tensorflow for Java
The following test code leaks memory:
private static final float[] X = new float[]{1,2,3,4,5,6,7,8,9,1,0}; public void testTensorFlowMemory() { // create a graph and session try (Graph g = new Graph(); Session s = new Session(g)) { // create a placeholder x and a const for the dimension to do a cumulative sum along Output x = g.opBuilder("Placeholder","x").setAttr("dtype",DataType.FLOAT).build().output(0); Output dims = g.opBuilder("Const","dims").setAttr("dtype",DataType.INT32).setAttr("value",Tensor.create(0)).build().output(0); Output y = g.opBuilder("Cumsum","y").addInput(x).addInput(dims).build().output(0); // loop a bunch to test memory usage for (int i=0; i<10000000; i++){ // create a tensor from X Tensor tx = Tensor.create(X); // run the graph and fetch the resulting y tensor Tensor ty = s.runner().Feed("x",tx).fetch("y").run().get(0); // close the tensors to release their resources tx.close(); ty.close(); } System.out.println("non-threaded test finished"); } }
Is there anything obvious I did wrong? The basic process is to create a graph and session on the graph, create placeholders and constants to perform cumulative sum on tensors in X After running the generated y operation, I close the X and Y tensors to free their memory resources
What I believe has helped so far:
>This is not a Java object memory problem According to the jvisual VM, the heap will not grow and other memory in the JVM will not grow According to Java's native memory trace, it does not seem to be a JVM memory leak. > Closing operations are helping, and if they are not there, memory will soar As they are in place, it will still become very fast, but almost as much as without them. > The cumsum operator is not important, it also applies to sum and other operators > it occurs on Mac OS with TF 1.1 and with TF 1.1 and 1.2_ Rc0's CentOS 7 > comment tensor ty line can eliminate leakage, so it seems to be there
Any ideas? thank you! In addition, here's a GitHub project that demonstrates this issue has both thread testing (to grow memory faster) and wireless testing (to show that it is not due to threads) It uses Maven and can simply run:
mvn test
Solution
I believe there is indeed a leak (especially the lack of tf_deletestatus corresponding to allocation in JNI code) (thank you for your detailed description)
I encourage you to http://github.com/tensorflow/tensorflow/issues Raise the issue and hope it should be fixed before the final version 1.2
(related, since the tensor object created by tensor. Create (0) is not closed, there is also a leak outside the loop)
Update: This is fixed, 1.2 0-rc1 should no longer have this problem