Java – get term frequency in Lucene
Is there a quick and easy way to get term frequencies from the Lucene index without using the termvectorfrequencies class because it takes a lot of time to process large collections?
I mean, is there something like termenum that has not only document frequency, but also term frequency?
Update: using termdocs is too slow
Solution
Use termdocs to get the term frequency of a given document As with document frequency, you can get term documents from indexreader using terms of interest
Without losing some universality, you will not find a faster method than termdoc Termdocs is read directly from the ". FRQ" file in the index segment, where each term frequency is listed in document order
If this is "too slow", make sure you have optimized the index to combine multiple segments into one Iterate over documents in order (skip normal, but can't effectively jump back and forth in the document list)
Your next step may be to do additional processing to create a more specialized file structure, omitting skipdata Personally, I will look for a better algorithm to achieve my goal, or provide better hardware - a large amount of memory, either holding ramdirectory, or providing it to the operating system for use on my own file cache system