Java, Apache POI, memory leak and excel file

I need to read (15000) Excel files for my thesis I'm using Apache POI to open them and analyze them later, but after about 5000 files, I get the following exception and stacktrace:

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.xmlbeans.impl.store.Cur$CurLoadContext.attr(Cur.java:3044)
at org.apache.xmlbeans.impl.store.Cur$CurLoadContext.attr(Cur.java:3065)
at org.apache.xmlbeans.impl.store.Locale$SaxHandler.startElement(Locale.java:3263)
at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.reportStartTag(Piccolo.java:1082)
at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseAttributesNS(PiccoloLexer.java:1822)
at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parSEOpenTagNS(PiccoloLexer.java:1521)
at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseTagNS(PiccoloLexer.java:1362)
at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.yylex(PiccoloLexer.java:4682)
at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yylex(Piccolo.java:1290)
at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yyparse(Piccolo.java:1400)
at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.parse(Piccolo.java:714)
at org.apache.xmlbeans.impl.store.Locale$SaxLoader.load(Locale.java:3479)
at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1277)
at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1264)
at org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(SchemaTypeLoaderBase.java:345)
at org.apache.poi.POIXMLTypeLoader.parse(POIXMLTypeLoader.java:92)
at org.openxmlformats.schemas.spreadsheetml.x2006.main.WorksheetDocument$Factory.parse(UnkNown Source)
at org.apache.poi.xssf.usermodel.XSSFSheet.read(XSSFSheet.java:173)
at org.apache.poi.xssf.usermodel.XSSFSheet.onDocumentRead(XSSFSheet.java:165)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.parseSheet(XSSFWorkbook.java:417)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.onDocumentRead(XSSFWorkbook.java:382)
at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:178)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:249)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:302)
at de.spreadsheet_realtions.analysis.WorkbookAnalysis.analyze(WorkbookAnalysis.java:18)

Code (just open and close the file):

public static void main(String[] args) {
    start();
}

public void start(){
    File[] files = getAllFiles(Config.folder);
    ZipSecureFile.setMinInflateRatio(0.00);
    for(File f: files){
        analyze(f);
    }
}

public void analyze(File file){
    Workbook  workbook = null;
    try {
        workbook = new XSSFWorkbook(file); //line 18
    } catch (Exception e1) {e1.printStackTrace(); return;}
//      later would be here the code to analyze the workbook
    try {
        workbook.close();
    } catch (Exception e) {e.printStackTrace();}
}

I also tried opcpackage Open (file), I got the same result

What have I done wrong or what can I do to solve this problem? Thanks for your help.

Edit: the following codes are the same

try (XSSFWorkbook workbook = new XSSFWorkbook(file)){
} catch (Exception e1) {e1.printStackTrace(); return;}

Solution

Typically, POI has the entire workbook in memory Therefore, large workbooks need different methods

Sxssf can be used when writing, and most calls are the same, except that there are only a certain number of rows in memory

In your case, you are reading To do this, you can use their event driven API The basic idea here is that you won't treat the workbook as a huge object On the contrary, you can read it at will, and you can save it in your own data structure according to your own wishes Or you can simply deal with it while reading instead of being very economical

Because this is a lower level API (driven by the data structure being read), xls has one method and xlsx has a different method Check the POI "how to" page and find the section entitled "xssf and sax (event API)"

This example demonstrates how to detect the value of each cell when reading in (xercesimpl. Jar. Is required on your library path.)

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>