Java – get term frequency in Lucene

2019-05-07 • Java

Is there a quick and easy way to get term frequencies from the Lucene index without using the termvectorfrequencies class because it takes a lot of time to process large collections?

I mean, is there something like termenum that has not only document frequency, but also term frequency?

Update: using termdocs is too slow

Solution

Use termdocs to get the term frequency of a given document As with document frequency, you can get term documents from indexreader using terms of interest

Without losing some universality, you will not find a faster method than termdoc Termdocs is read directly from the ". FRQ" file in the index segment, where each term frequency is listed in document order

If this is "too slow", make sure you have optimized the index to combine multiple segments into one Iterate over documents in order (skip normal, but can't effectively jump back and forth in the document list)

Your next step may be to do additional processing to create a more specialized file structure, omitting skipdata Personally, I will look for a better algorithm to achieve my goal, or provide better hardware - a large amount of memory, either holding ramdirectory, or providing it to the operating system for use on my own file cache system

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.

THE END

Java

二维码

Java – passing methods as parameters – is this possible?

< <上一篇

In Linux 7 Install JDK + Tomcat + Oracle9i on 1

下一篇>>

搜索内容

Java – get term frequency in Lucene

Solution

热门文章