How do I index words with hyphens in Lucene?

2020-08-03 • Java

I have a standard analyzer that uses termvectormapper populated with HashMap to retrieve words and frequencies from a single document

But if I use the following text as a field in the document, that is

addDoc(w,"lucene Lawton-Browne Lucene");

The frequency of returned words in HashMap is:

Select 1 Lucene 2 Lawton 1

The problem is the words "Lawton" and "Brown" If this is a real "double pipe" name, can Lucene recognize it as' Lawton Browne ', is its name actually a word?

I've tried combinations:

addDoc(w,"lucene \”Lawton-Browne\” Lucene");

Single quotes, but without success

thank you

Mr. Morgan

Solution

If you still want to be able to use the stop word list, I suggest you try using pattern analyzer It allows such lists and has a pre - filled blank pattern

Or you can wrap the blank analyzer and perform operations like this in tokenstream (string fieldname, reader reader). You can do this:

public TokenStream tokenStream(String fieldName,Reader reader) {
  TokenStream stream = myWhitespaceAnalyzer.tokenStream(fieldName,Reader);
  stream = new StopFilter(stream,stopWords);
  return stream;
}

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.

THE END

Java

二维码

Java – when I build an Android project with multiple libraries using ant, my build fails

< <上一篇

Java – how does limit in MySQL query make it possible to cancel flow

下一篇>>

搜索内容

How do I index words with hyphens in Lucene?

Solution

热门文章