How do I index words with hyphens in Lucene?

I have a standard analyzer that uses termvectormapper populated with HashMap to retrieve words and frequencies from a single document

But if I use the following text as a field in the document, that is

addDoc(w,"lucene Lawton-Browne Lucene");

The frequency of returned words in HashMap is:

Select 1 Lucene 2 Lawton 1

The problem is the words "Lawton" and "Brown" If this is a real "double pipe" name, can Lucene recognize it as' Lawton Browne ', is its name actually a word?

I've tried combinations:

addDoc(w,"lucene \”Lawton-Browne\” Lucene");

Single quotes, but without success

thank you

Mr. Morgan

Solution

If you still want to be able to use the stop word list, I suggest you try using pattern analyzer It allows such lists and has a pre - filled blank pattern

Or you can wrap the blank analyzer and perform operations like this in tokenstream (string fieldname, reader reader). You can do this:

public TokenStream tokenStream(String fieldName,Reader reader) {
  TokenStream stream = myWhitespaceAnalyzer.tokenStream(fieldName,Reader);
  stream = new StopFilter(stream,stopWords);
  return stream;
}
The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>