Java – changes to Lucene index files during document acquisition / update / deletion?
I'm working on the latest version of Lucene 4.10 2. It combines Java as the front end and Oracle 12C as the database
I have indexed a user table with 1 million rows (remember LinkedIn user table)
When we add documents / update documents / delete documents, can anyone explain the exact changes in my folder (files are indexed)?
Additional sample images:
I'm trying to understand the file structure of Lucene folder, where all index files are placed
This is just a one to many relationship structure (we search without login). Later, I will go to many relationships (connection, connection, user's 1:1 index folder)
If my method is right / wrong, please let me know
Solution
The Lucene index consists of multiple "segments" Each segment is written only once, either when commit () is called or when commit () is called automatically (by setting indexwriter to auto commit when RAM usage reaches a given threshold) Typically, when you search an index, each segment is searched sequentially and the results are combined Lucene works this way because modifying a segment can be a very slow process Subdivisions can be combined to improve search results [1]
In your example, the_ Files starting with 0 are the first segment, starting with_ The document beginning with 1 is the second paragraph CFE and CFS files are "composite files", which contain all the index files of the segment (a bit like zip files) For more information, see file extensions and formats for the default codec
So your three operations are as follows:
Add: the document will always be added to the new segment
Delete: deleted documents are not actually deleted from the index Instead, a flag is set to indicate that the document is deleted A document that is not deleted is called a "live document" Deleted documents will still affect the score through the document frequency field and will not be updated until the segments are merged
Update: update is just an atomic addition and deletion
[1] http://blog.trifork.com/2011/11/21/simon-says-optimize-is-bad-for-you/