Java – Stanford parser out of memory
I tried to run the Stanford parser in Ubuntu using Python code My text file is 500 MB and I tried to parse it I have a 32GB ram I'm increasing the JVM size, but I don't know if it's actually increasing, because every time I receive this error Please help me
WARNING!! OUT OF MEMORY! THERE WAS NOT ENOUGH *** *** MEMORY TO RUN ALL PARSERS. EITHER GIVE THE *** *** JVM MORE MEMORY,SET THE MAXIMUM SENTENCE *** *** LENGTH WITH -maxLength,OR PERHAPS YOU ARE *** *** HAPPY TO HAVE THE PARSER FALL BACK TO USING *** *** A SIMPLER PARSER FOR VERY LONG SENTENCES. *** Sentence has no parse using PCFG grammar (or no PCFG fallback). Skipping... Exception in thread "main" edu.stanford.nlp.parser.common.NoSuchParseException at edu.stanford.nlp.parser.lexparser.LexicalizedParserQuery.getBestParse(LexicalizedParserQuery.java:398) at edu.stanford.nlp.parser.lexparser.LexicalizedParserQuery.getBestParse(LexicalizedParserQuery.java:370) at edu.stanford.nlp.parser.lexparser.ParseFiles.processResults(ParseFiles.java:271) at edu.stanford.nlp.parser.lexparser.ParseFiles.parseFiles(ParseFiles.java:215) at edu.stanford.nlp.parser.lexparser.ParseFiles.parseFiles(ParseFiles.java:74) at edu.stanford.nlp.parser.lexparser.LexicalizedParser.main(LexicalizedParser.java:1513)
Solution
You should divide the text file into small pieces and assign them to the parser one at a time Since the parser creates a memory representation of the entire "document", it gives it at a time (several orders of magnitude larger than the document on disk), trying to give it a 500 MB document is a very bad idea One breath
You should also avoid using overly long "sentences", which can easily happen if you are casual or web page text lacks sentence separators, or you are providing them with a large table or garbled code The safest way to avoid this problem is to set a parameter that limits the maximum sentence length, such as - MaxLength 100
You may want to try the neural network dependency parser, which can be better extended to large tasks: http://nlp.stanford.edu/software/nndep.shtml.