Java EE – reasons for different search results of the same elastic search query on two nodes
I have a two node elastic search setting. The same search query on one node leads to different results from the other node. I want to find out the reason Details:
>The same document (same content and ID) has different scores on two nodes, resulting in different sorting order. > It's reproducible: I can delete the entire index and rebuild it from the database, but the results are still different. > Two es nodes are embedded in Java EE war In each deployment, the index is rebuilt from the database. > When the problem was first found, hits Total will cause the results of the same query to be different on two nodes After deleting and rebuilding the index, they are the same. > My current solution is to use preferences here as suggested=_ local. > So far, I can't find any interesting errors in my log
_ Cluster / status:
{ "cluster_name": "elasticsearch.abc","version": 330,"master_node": "HexGKOoHSxqRaMmwduCVIA","blocks": {},"nodes": { "rUZDrUfMR1-RWcy4t0YQNw": { "name": "Owl","transport_address": "inet[/10.123.123.123:9303]","attributes": {} },"HexGKOoHSxqRaMmwduCVIA": { "name": "Bloodlust II","transport_address": "inet[/10.123.123.124:9303]","attributes": {} } },"Metadata": { "templates": {},"indices": { "abc": { "state": "open","settings": { "index": { "creation_date": "1432297566361","uuid": "LKx6Ro9CRXq6JZ9a29jWeA","analysis": { "filter": { "substring": { "type": "nGram","min_gram": "1","max_gram": "50" } },"analyzer": { "str_index_analyzer": { "filter": [ "lowercase","substring" ],"tokenizer": "keyword" },"str_search_analyzer": { "filter": [ "lowercase" ],"tokenizer": "keyword" } } },"number_of_replicas": "1","number_of_shards": "5","version": { "created": "1050099" } } },"mappings": { "some_mapping": { ... } ... },"aliases": [] } } },"routing_table": { "indices": { "abc": { "shards": { "0": [ { "state": "STARTED","primary": true,"node": "HexGKOoHSxqRaMmwduCVIA","relocating_node": null,"shard": 0,"index": "abc" },{ "state": "STARTED","primary": false,"node": "rUZDrUfMR1-RWcy4t0YQNw","index": "abc" } ],"1": [ { "state": "STARTED","shard": 1,"2": [ { "state": "STARTED","shard": 2,"3": [ { "state": "STARTED","shard": 3,"4": [ { "state": "STARTED","shard": 4,"index": "abc" } ] } } } },"routing_nodes": { "unassigned": [],"nodes": { "HexGKOoHSxqRaMmwduCVIA": [ { "state": "STARTED","index": "abc" },{ "state": "STARTED","index": "abc" } ],"rUZDrUfMR1-RWcy4t0YQNw": [ { "state": "STARTED","index": "abc" } ] } },"allocations": [] }
_ Cluster / health
{ "cluster_name": "elasticsearch.abc","status": "green","timed_out": false,"number_of_nodes": 2,"number_of_data_nodes": 2,"active_primary_shards": 5,"active_shards": 10,"relocating_shards": 0,"initializing_shards": 0,"unassigned_shards": 0,"number_of_pending_tasks": 0 }
_ Cluster / statistics
{ "timestamp": 1432312770877,"cluster_name": "elasticsearch.abc","indices": { "count": 1,"shards": { "total": 10,"primaries": 5,"replication": 1,"index": { "shards": { "min": 10,"max": 10,"avg": 10 },"primaries": { "min": 5,"max": 5,"avg": 5 },"replication": { "min": 1,"max": 1,"avg": 1 } } },"docs": { "count": 19965,"deleted": 4 },"store": { "size_in_bytes": 399318082,"throttle_time_in_millis": 0 },"fielddata": { "memory_size_in_bytes": 60772,"evictions": 0 },"filter_cache": { "memory_size_in_bytes": 15284,"id_cache": { "memory_size_in_bytes": 0 },"completion": { "size_in_bytes": 0 },"segments": { "count": 68,"memory_in_bytes": 10079288,"index_writer_memory_in_bytes": 0,"index_writer_max_memory_in_bytes": 5120000,"version_map_memory_in_bytes": 0,"fixed_bit_set_memory_in_bytes": 0 },"percolate": { "total": 0,"time_in_millis": 0,"current": 0,"memory_size_in_bytes": -1,"memory_size": "-1b","queries": 0 } },"nodes": { "count": { "total": 2,"master_only": 0,"data_only": 0,"master_data": 2,"client": 0 },"versions": [ "1.5.0" ],"os": { "available_processors": 8,"mem": { "total_in_bytes": 0 },"cpu": [] },"process": { "cpu": { "percent": 0 },"open_file_descriptors": { "min": 649,"max": 654,"avg": 651 } },"jvm": { "max_uptime_in_millis": 2718272183,"versions": [ { "version": "1.7.0_40","vm_name": "Java HotSpot(TM) 64-Bit Server VM","vm_version": "24.0-b56","vm_vendor": "Oracle Corporation","count": 2 } ],"mem": { "heap_used_in_bytes": 2665186528,"heap_max_in_bytes": 4060086272 },"threads": 670 },"fs": { "total_in_bytes": 631353901056,"free_in_bytes": 209591468032,"available_in_bytes": 209591468032 },"plugins": [] } }
Sample query:
/_search?from=22&size=1 { "query": { "bool": { "should": [{ "match": { "address.city": { "query": "Bremen","boost": 2 } } }],"must": [{ "match": { "type": "L" } }] } } }
Response to the first request
{ "took": 30,"_shards": { "total": 5,"successful": 5,"Failed": 0 },"hits": { "total": 19543,"max_score": 6.407021,"hits": [{ "_index": "abc","_type": "xyz","_id": "ABC123","_score": 5.8341036,"_source": { ... } }] } }
Response to the second request
{ "took": 27,"Failed": 0 },"hits": [ { "_index": "abc","_id": "FGH12343","_source": { ... } } ] }
}
What are the possible reasons for this? How can I ensure that the results of different nodes are the same?
Explain the query as required: search / ABC / mytype /_ search? from = 0& size = 1& search_ type = dfs_ query_ then_ fetch& explain =
{ "query": { "bool": { "should": [{ "match": { "address.city": { "query": "Karlsruhe","boost": 2 } } }] } } }
Response to the first request
{ "took": 5,"_shards": { "total": 5,"Failed": 0 },"hits": { "total": 41,"max_score": 7.211497,"hits": [ { "_shard": 0,"_node": "rUZDrUfMR1-RWcy4t0YQNw","_index": "abc","_type": "mytype","_id": "abc123","_score": 7.211497,"_source": {... },"_explanation": { "value": 7.211497,"description": "weight(address.city:karlsruhe^2.0 in 1598) [PerFieldSimilarity],result of:","details": [ { "value": 7.211497,"description": "fieldWeight in 1598,product of:","details": [ { "value": 1,"description": "tf(freq=1.0),with freq of:","details": [ { "value": 1,"description": "termFreq=1.0" } ] },{ "value": 7.211497,"description": "idf(docFreq=46,maxDocs=23427)" },{ "value": 1,"description": "fieldNorm(doc=1598)" } ] } ] } } ] } }
Response to the second request
{ "took": 6,"max_score": 7.194322,"_score": 7.194322,"_explanation": { "value": 7.194322,"details": [ { "value": 7.194322,{ "value": 7.194322,"description": "idf(docFreq=48,maxDocs=24008)" },"description": "fieldNorm(doc=1598)" } ] } ] } } ] } }
Solution
The hit mismatch is most likely due to the lack of synchronization between the primary shard and the replica This happens if you have a node leaving the cluster (for whatever reason) but continue to make changes to the document (index, delete, update)
The scoring part is a different story, which can be explained through the "relevance scoring" part of this blog post:
I will try "DFS query then fetch" when searching, which means_ search? search_ type = dfs_ query_ then_ fetch …. This should contribute to the accuracy of the score
In addition, different document counts caused by document changes during node disconnection affect the score calculation even after deleting and re indexing This may be because the document changes on the replica and the master partition are different, and more specifically, the document has been deleted Deleted documents will be permanently deleted from the index during the merge period Segment merging does not occur unless certain conditions are met in the underlying Lucene instance
Force merge can be initiated via post /_ optimize? max_ num_ segments = 1. Warning: this takes a long time (depending on the size of the index) and requires a lot of IO resources and CPU. It should not run on the index being changed File: optimize, segments merging