Java – changes to Lucene index files during document acquisition / update / deletion?

I'm working on the latest version of Lucene 4.10 2. It combines Java as the front end and Oracle 12C as the database

I have indexed a user table with 1 million rows (remember LinkedIn user table)

When we add documents / update documents / delete documents, can anyone explain the exact changes in my folder (files are indexed)?

Additional sample images:

I'm trying to understand the file structure of Lucene folder, where all index files are placed

This is just a one to many relationship structure (we search without login). Later, I will go to many relationships (connection, connection, user's 1:1 index folder)

If my method is right / wrong, please let me know

Solution

The Lucene index consists of multiple "segments" Each segment is written only once, either when commit () is called or when commit () is called automatically (by setting indexwriter to auto commit when RAM usage reaches a given threshold) Typically, when you search an index, each segment is searched sequentially and the results are combined Lucene works this way because modifying a segment can be a very slow process Subdivisions can be combined to improve search results [1]

In your example, the_ Files starting with 0 are the first segment, starting with_ The document beginning with 1 is the second paragraph CFE and CFS files are "composite files", which contain all the index files of the segment (a bit like zip files) For more information, see file extensions and formats for the default codec

So your three operations are as follows:

Add: the document will always be added to the new segment

Delete: deleted documents are not actually deleted from the index Instead, a flag is set to indicate that the document is deleted A document that is not deleted is called a "live document" Deleted documents will still affect the score through the document frequency field and will not be updated until the segments are merged

Update: update is just an atomic addition and deletion

[1] http://blog.trifork.com/2011/11/21/simon-says-optimize-is-bad-for-you/

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>