Java – Apache POI uses HSSF much faster than xssf – what’s next?

I use Apache POI to parse There were some problems with xlsx files - I received Java Lang. outofmemoryerror: Java heap space in my deployed application I only deal with 5MB and about 70000 lines of files, so I doubt that some things are unfortunate from reading other problems

As shown in this comment, I decided to run ssperformancetest with the suggested variable Java to see if there is a problem with my code or settings The results show that there are significant differences between HSSF (. XLS) and xssf (. Xlsx):

1) HSSF 50000 50 1: after 1 second

2) Sxssf 50000 50 1: after 5 seconds

3) Xssf 50000 50 1: after 15 seconds

The FAQ specifically states:

Next, it says to run the xls2csv I have completed java. Loading in the xssf file generated above (50000 rows and 50 columns) takes about 15 seconds - the same time as writing to the file

Is there something wrong with my environment? If so, how can I further investigate?

Statistics from visualvm show that up to 1.2GB of heap is used during processing Of course, this is too high, because before the processing starts, is this an additional performance?

Note: the heap space exception mentioned above only occurs in production (on Google App Engine), but only in Xlsx file, but the tests mentioned in this problem have been run on my development machine - xmx2g I hope if I can solve the problem of my development setup, it will use less memory during deployment

Stack trace from application engine:

Solution

I am facing the same problem, using Apache POI to read huge Xlsx file, I encountered

excel-streaming-reader-github

The library serves as a wrapper around the stream API while preserving the syntax of the standard POI API

The library can help you read large files

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>