Java, Apache POI, memory leak and excel file
I need to read (15000) Excel files for my thesis I'm using Apache POI to open them and analyze them later, but after about 5000 files, I get the following exception and stacktrace:
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded at org.apache.xmlbeans.impl.store.Cur$CurLoadContext.attr(Cur.java:3044) at org.apache.xmlbeans.impl.store.Cur$CurLoadContext.attr(Cur.java:3065) at org.apache.xmlbeans.impl.store.Locale$SaxHandler.startElement(Locale.java:3263) at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.reportStartTag(Piccolo.java:1082) at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseAttributesNS(PiccoloLexer.java:1822) at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parSEOpenTagNS(PiccoloLexer.java:1521) at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseTagNS(PiccoloLexer.java:1362) at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.yylex(PiccoloLexer.java:4682) at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yylex(Piccolo.java:1290) at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yyparse(Piccolo.java:1400) at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.parse(Piccolo.java:714) at org.apache.xmlbeans.impl.store.Locale$SaxLoader.load(Locale.java:3479) at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1277) at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1264) at org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(SchemaTypeLoaderBase.java:345) at org.apache.poi.POIXMLTypeLoader.parse(POIXMLTypeLoader.java:92) at org.openxmlformats.schemas.spreadsheetml.x2006.main.WorksheetDocument$Factory.parse(UnkNown Source) at org.apache.poi.xssf.usermodel.XSSFSheet.read(XSSFSheet.java:173) at org.apache.poi.xssf.usermodel.XSSFSheet.onDocumentRead(XSSFSheet.java:165) at org.apache.poi.xssf.usermodel.XSSFWorkbook.parseSheet(XSSFWorkbook.java:417) at org.apache.poi.xssf.usermodel.XSSFWorkbook.onDocumentRead(XSSFWorkbook.java:382) at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:178) at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:249) at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:302) at de.spreadsheet_realtions.analysis.WorkbookAnalysis.analyze(WorkbookAnalysis.java:18)
Code (just open and close the file):
public static void main(String[] args) { start(); } public void start(){ File[] files = getAllFiles(Config.folder); ZipSecureFile.setMinInflateRatio(0.00); for(File f: files){ analyze(f); } } public void analyze(File file){ Workbook workbook = null; try { workbook = new XSSFWorkbook(file); //line 18 } catch (Exception e1) {e1.printStackTrace(); return;} // later would be here the code to analyze the workbook try { workbook.close(); } catch (Exception e) {e.printStackTrace();} }
I also tried opcpackage Open (file), I got the same result
What have I done wrong or what can I do to solve this problem? Thanks for your help.
Edit: the following codes are the same
try (XSSFWorkbook workbook = new XSSFWorkbook(file)){ } catch (Exception e1) {e1.printStackTrace(); return;}
Solution
Typically, POI has the entire workbook in memory Therefore, large workbooks need different methods
Sxssf can be used when writing, and most calls are the same, except that there are only a certain number of rows in memory
In your case, you are reading To do this, you can use their event driven API The basic idea here is that you won't treat the workbook as a huge object On the contrary, you can read it at will, and you can save it in your own data structure according to your own wishes Or you can simply deal with it while reading instead of being very economical
Because this is a lower level API (driven by the data structure being read), xls has one method and xlsx has a different method Check the POI "how to" page and find the section entitled "xssf and sax (event API)"
This example demonstrates how to detect the value of each cell when reading in (xercesimpl. Jar. Is required on your library path.)