Stanford’s POS tagger is used in Java

2019-12-22 • Java

Mar 9,2011 1:22:06 PM edu.stanford.nlp.process.PTBLexer next

Mar 9,2011 1:22:06 PM edu.stanford.nlp.process.PTBLexer next
WARNING: Untokenizable: � (U+FFFD,decimal: 65533)
Mar 9,decimal: 65533)

These are the errors I get when I want to assign POS tags to sentences I read sentences from the file At first (a few words) I didn't get the error (i.e. indecipherable), but after reading some sentences, the error appeared I use v2 0 (i.e. 2009), and the model is left3words

Solution

I agree with Yuval - a character encoding problem, but the most common case is that when the marker attempts to read a file in UTF-8, the file uses single byte encoding (such as iso-8859-1) See Wikipedia's ufffd discussion

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.

THE END

Java

二维码

Java – can I pass parameters to enum values?

< <上一篇

Print vowels with words in Java

下一篇>>

搜索内容

Stanford’s POS tagger is used in Java

Solution

热门文章