Java – Stanford parser out of memory

2020-08-26 • Java

I tried to run the Stanford parser in Ubuntu using Python code My text file is 500 MB and I tried to parse it I have a 32GB ram I'm increasing the JVM size, but I don't know if it's actually increasing, because every time I receive this error Please help me

WARNING!! OUT OF MEMORY! THERE WAS NOT ENOUGH  ***
***  MEMORY TO RUN ALL PARSERS.  EITHER GIVE THE    ***
***  JVM MORE MEMORY,SET THE MAXIMUM SENTENCE      ***
***  LENGTH WITH -maxLength,OR PERHAPS YOU ARE     ***
***  HAPPY TO HAVE THE PARSER FALL BACK TO USING    ***
***  A SIMPLER PARSER FOR VERY LONG SENTENCES.      ***
Sentence has no parse using PCFG grammar (or no PCFG fallback).  Skipping...
Exception in thread "main" edu.stanford.nlp.parser.common.NoSuchParseException
    at edu.stanford.nlp.parser.lexparser.LexicalizedParserQuery.getBestParse(LexicalizedParserQuery.java:398)
    at edu.stanford.nlp.parser.lexparser.LexicalizedParserQuery.getBestParse(LexicalizedParserQuery.java:370)
    at edu.stanford.nlp.parser.lexparser.ParseFiles.processResults(ParseFiles.java:271)
    at edu.stanford.nlp.parser.lexparser.ParseFiles.parseFiles(ParseFiles.java:215)
    at edu.stanford.nlp.parser.lexparser.ParseFiles.parseFiles(ParseFiles.java:74)
    at edu.stanford.nlp.parser.lexparser.LexicalizedParser.main(LexicalizedParser.java:1513)

Solution

You should divide the text file into small pieces and assign them to the parser one at a time Since the parser creates a memory representation of the entire "document", it gives it at a time (several orders of magnitude larger than the document on disk), trying to give it a 500 MB document is a very bad idea One breath

You should also avoid using overly long "sentences", which can easily happen if you are casual or web page text lacks sentence separators, or you are providing them with a large table or garbled code The safest way to avoid this problem is to set a parameter that limits the maximum sentence length, such as - MaxLength 100

You may want to try the neural network dependency parser, which can be better extended to large tasks: http://nlp.stanford.edu/software/nndep.shtml.

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.

THE END

Java

二维码

Java – HashMap can only be copied through hashcode()

< <上一篇

Java – use the original string in the modified mail request body

下一篇>>

搜索内容

Java – Stanford parser out of memory

Solution

热门文章