Java – Lucene highlighter

Lucene 4.3. How does a highlighter work? I want to print out the search results from the document (as the search word and the 8 words after the word) How do I use fluorescent pens to do this? I have added the complete TXT, HTML and XML documents to the file and added them to my index. Now I have a search formula from which I may add a highlighter function:

String index = "index";
String field = "contents";
String queries = null;
int repeat = 1;
boolean raw = true; //not sure what raw really does???
String queryString = null; //keep null,prompt user later for it
int hitsPerPage = 10; //leave it at 10,go from there later

//need to add all files to same directory
index = "C:\\Users\\plib\\Documents\\index";
repeat = 4;


IndexReader reader = DirectoryReader.open(FSDirectory.open(new File(index)));
IndexSearcher searcher = new IndexSearcher(reader);
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_43);

BufferedReader in = null;
if (queries != null) {
  in = new BufferedReader(new InputStreamReader(new FileInputStream(queries),"UTF-8"));
} else {
  in = new BufferedReader(new InputStreamReader(system.in,"UTF-8"));
}
QueryParser parser = new QueryParser(Version.LUCENE_43,field,analyzer);
while (true) {
  if (queries == null && queryString == null) {                        // prompt the user
    System.out.println("Enter query. 'quit' = quit: ");
  }

  String line = queryString != null ? queryString : in.readLine();

  if (line == null || line.length() == -1) {
    break;
  }

  line = line.trim();
  if (line.length() == 0 || line.equalsIgnoreCase("quit")) {
    break;
  }

  Query query = parser.parse(line);
  System.out.println("Searching for: " + query.toString(field));

  if (repeat > 0) {                           // repeat & time as benchmark
    Date start = new Date();
    for (int i = 0; i < repeat; i++) {
      searcher.search(query,null,100);
    }
    Date end = new Date();
    System.out.println("Time: "+(end.getTime()-start.getTime())+"ms");
  }

  doPagingSearch(in,searcher,query,hitsPerPage,raw,queries == null && queryString == null);

  if (queryString != null) {
    break;
  }
}
reader.close();

}

Solution

I had the same problem and finally came across this article

http://vnarcher.blogspot.ca/2012/04/highlighting-text-with-lucene.html

The key part is that when you iterate over the result, gethighlightedfield. Is called on the result value to highlight

private String getHighlightedField(Query query,Analyzer analyzer,String fieldName,String fieldValue) throws IOException,InvalidTokenOffsetsException {
    Formatter formatter = new SimpleHTMLFormatter("<span class="\"MatchedText\"">","</span>");
    Queryscorer queryscorer = new Queryscorer(query);
    Highlighter Highlighter = new Highlighter(formatter,queryscorer);
    Highlighter.setTextFragmenter(new SimpleSpanFragmenter(queryscorer,Integer.MAX_VALUE));
    Highlighter.setMaxDocCharsToAnalyze(Integer.MAX_VALUE);
    return Highlighter.getBestFragment(this.analyzer,fieldName,fieldValue);
}

In this case, it assumes that the output will be HTML, which simply wraps the highlighted text with < span > Use the CSS class of matchedtext You can then define custom CSS rules to perform whatever you want to highlight

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>