Java – Apache POI uses HSSF much faster than xssf – what’s next?
I use Apache POI to parse There were some problems with xlsx files - I received Java Lang. outofmemoryerror: Java heap space in my deployed application I only deal with 5MB and about 70000 lines of files, so I doubt that some things are unfortunate from reading other problems
As shown in this comment, I decided to run ssperformancetest with the suggested variable Java to see if there is a problem with my code or settings The results show that there are significant differences between HSSF (. XLS) and xssf (. Xlsx):
1) HSSF 50000 50 1: after 1 second
2) Sxssf 50000 50 1: after 5 seconds
3) Xssf 50000 50 1: after 15 seconds
The FAQ specifically states:
Next, it says to run the xls2csv I have completed java. Loading in the xssf file generated above (50000 rows and 50 columns) takes about 15 seconds - the same time as writing to the file
Is there something wrong with my environment? If so, how can I further investigate?
Statistics from visualvm show that up to 1.2GB of heap is used during processing Of course, this is too high, because before the processing starts, is this an additional performance?
Note: the heap space exception mentioned above only occurs in production (on Google App Engine), but only in Xlsx file, but the tests mentioned in this problem have been run on my development machine - xmx2g I hope if I can solve the problem of my development setup, it will use less memory during deployment
Stack trace from application engine:
Solution
I am facing the same problem, using Apache POI to read huge Xlsx file, I encountered
excel-streaming-reader-github
The library serves as a wrapper around the stream API while preserving the syntax of the standard POI API
The library can help you read large files