Optimize memory usage of string collections in Java
I have a large number of name value pairs (about 100k), and I need to store them in some kind of cache (such as hash mapping), where the value is a string with an average size of about 30K bytes
Now I know the fact that a large number of values have exactly the same string data To avoid having to allocate the same string data multiple times, I want to reuse the previously allocated string in some way, thus consuming less memory In addition, this needs to be quite fast That is, scanning all previously assigned values one by one is not an option
Any suggestions on how I can solve this problem?
Solution
Do not use string Intern (there have been various memory problems related to this over the years) Instead, create your own cache, similar to string intern. Basically, you want a map where each key maps to itself Then, before caching any string, you "practice" it:
private Map<String,WeakReference<String>> myInternMap = new WeakHashMap<String,WeakReference<String>>(); public String intern(String value) { synchronized(myInternMap) { WeakReference<String> curRef = myInternMap.get(value); String curValue = ((curRef != null) ? curRef.get() : null); if(curValue != null) { return curValue; } myInternMap.put(value,new WeakReference<String>(value)); return value; } }
Note that weak references are used for keys and values so that references to strings that are no longer used are not preserved