Why does Java HashMap get (key) faster than using set’s iterator to read the key when using the same HashMap iterator?

For HashMap < integer, integer >, after inserting 10000000 unique random values I use HashMap's keyset () to execute get (), as shown in the following code fragment:

HashMap<Integer,Integer> hashmap = 
                        new HashMap<Integer,Integer>(10000000,0.99f);

// ... Code to put unique 10000000 associations into the hashmap ...

int iteration = 100;
long startTime,totalTime = 0;

while(iteration > 0) {
    for(Integer key: hashmap.keySet()) {
       startTime = System.currentTimeMillis();
       hashmap.get(key);
       totalTime += (System.currentTimeMillis() - startTime);
    }
    iteration--;
}
System.out.println(totalTime/100 + " ms");

Running the above code, I get: 225 milliseconds

Now, if I change the above code to use set, as in the following code fragment:

Set<Integer> set = new HashSet<Integer>(hashmap.keySet());
while(iteration > 0) {
    for(Integer key: set) {
       startTime = System.currentTimeMillis();
       hashmap.get(key);
       totalTime += (System.currentTimeMillis() - startTime);
    }
    iteration--;
}
System.out.println(totalTime/100 + " ms");

After running this code, I get: 414 milliseconds

Why is this performance difference?

P. S.: I used the following JVM parameters:

-Xms2048m -Xmx4096m -XX:MaxPermSize=256m

Solution

When you read a large data structure (larger than 32 KB), how you read the data structure will affect performance

These are the typical size and speed of your cache

L1:   32 KB,4 clock cycles.
L2:  256 KB,11 clock cycles.
L3: 3-30 MB,40-75 clock cycles.
Main memory: up to 2TB,200-500 clock cycles.

This means that cache locality is very important That is, if you are reading something in L1, it is 20 times faster than reading from L3

In your case, you are using a hash data structure This is designed for random access and random arrangement. Unfortunately, it has very poor cacheability Random access to memory, which may be in a slower memory area

However, this is an exception If you access the same data multiple times, for example, get a key from the iterator, or scan a collection in order, for example, this is what the iterator does for HashMap (rather than treemap), the next data you will access is more likely to be in the same cache line (each cache line is 64 bytes long) or the next line These types of accesses perform better because the CPU is designed to perform vector operations very quickly

BTW your working set is a set of keys. If your values are different objects, I hope you will actually look at these objects much slower (because it will increase the size of the working set and how much memory the cache needs)

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>