Stanford’s POS tagger is used in Java

Mar 9,2011 1:22:06 PM edu.stanford.nlp.process.PTBLexer next
Mar 9,2011 1:22:06 PM edu.stanford.nlp.process.PTBLexer next
WARNING: Untokenizable: � (U+FFFD,decimal: 65533)
Mar 9,decimal: 65533)

These are the errors I get when I want to assign POS tags to sentences I read sentences from the file At first (a few words) I didn't get the error (i.e. indecipherable), but after reading some sentences, the error appeared I use v2 0 (i.e. 2009), and the model is left3words

Solution

I agree with Yuval - a character encoding problem, but the most common case is that when the marker attempts to read a file in UTF-8, the file uses single byte encoding (such as iso-8859-1) See Wikipedia's ufffd discussion

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>