Poor performance in Java – Solr space
I'm using Solr - 3.4 to filter space using a pattern with latlontype (subtype = tdouble) I have an index of about 20m My basic problem is, if I use cache = true to execute b@R_821_2419 @The performance of the filter is quite good (~ 40-50 QPS, about 100-150ms delay), but a big disadvantage is that the crazy fast old generation heap growth eventually leads to the main collection every 30-40 minutes (on a very large heap, 25gb) At that point, performance is unacceptable On the other hand, I can close b@R_821_2419 @Filter cache, but then my latency and QPS decrease (latency from 100ms = > 500ms) Numericrangequery Javadoc discussed the excellent performance you can achieve (less than 100 milliseconds), but now I want to know whether filtercache is enabled, and no one bothers to see the resulting heap growth I think this is a capture 22, because neither configuration is really acceptable
I am open to any idea My last idea (unproven) is to use geo hash (and pray that it performs better when cache = false, or has more manageable heap growth if cache = true)
Edit:
Precise steps: default (I think it is 8x)
System memory: 32GB (EC2 M2 2XL)
JVM:24GB
Index size: 11 GB
EDIT2:
Question: what does this actually mean for dual values?
Solution
Getting Solr's configuration file when responding to spatial queries is very helpful to understand what slow is, for example, see hprof
However, here are some ideas on how to (perhaps) improve latency
First, you can try to test what happens when you reduce precisionstep (for example, try 4) If the latitude and longitude are too close to each other and the precisionstep is too high, Lucene cannot take advantage of having multiple index values
You can also try to provide less memory for the JVM to provide more opportunities for the operating system cache to cache frequently accessed index files
Then, if it's still not fast enough, you can try to replace the triedoublefield with a subfield by using a frange query for the field type of the getrangequery method This will reduce the number of disk accesses and calculate the range at the cost of higher memory usage I've never tested it, and it can also provide terrible performance