How do I index words with hyphens in Lucene?
I have a standard analyzer that uses termvectormapper populated with HashMap to retrieve words and frequencies from a single document
But if I use the following text as a field in the document, that is
addDoc(w,"lucene Lawton-Browne Lucene");
The frequency of returned words in HashMap is:
Select 1 Lucene 2 Lawton 1
The problem is the words "Lawton" and "Brown" If this is a real "double pipe" name, can Lucene recognize it as' Lawton Browne ', is its name actually a word?
I've tried combinations:
addDoc(w,"lucene \”Lawton-Browne\” Lucene");
Single quotes, but without success
thank you
Mr. Morgan
Solution
If you still want to be able to use the stop word list, I suggest you try using pattern analyzer It allows such lists and has a pre - filled blank pattern
Or you can wrap the blank analyzer and perform operations like this in tokenstream (string fieldname, reader reader). You can do this:
public TokenStream tokenStream(String fieldName,Reader reader) { TokenStream stream = myWhitespaceAnalyzer.tokenStream(fieldName,Reader); stream = new StopFilter(stream,stopWords); return stream; }