Notes on Java collection — turn
Original address: http://calvin1978.blogcn.com/articles/collection.html
In the shortest possible space, the characteristics, implementation and performance of all sets and concurrent sets are reviewed. It is suitable for all people who are "proficient in Java" and are not so confident.
It is expected that it can not only be used in the interview, but also consider its cost and efficiency. Don't use it just because the API is appropriate.
1.List
1.1 ArrayList
Implemented as an array. Save space, but the array has a capacity limit. If the limit is exceeded, the capacity will be increased by 50%. Use system Arraycopy() is copied to a new array. Therefore, it is best to give an estimate of the size of the array. By default, an array with a size of 10 is created when the element is inserted for the first time.
Access elements by array subscript - get (I) and set (I, e) have high performance, which is the basic advantage of array.
If you insert elements and delete elements by subscript - add (I, e), remove (I), remove (E), you need to use system. Arraycopy() to copy the affected elements of the moving part, and the performance will deteriorate.
The more previous elements, the more elements to move when modifying. Add an element directly to the end of the array - the commonly used add (E). Deleting the last element has no effect.
1.2 LinkedList
It is realized by two-way linked list. The linked list has no capacity limit, but the two-way linked list itself uses more space. Each time an element is inserted, an additional node object must be constructed, and additional linked list pointer operations are also required.
Press the subscript to access the element - get (I), set (I, e) to traverse the linked list and move the pointer in place (if I > half the size of the array, it will move from the end).
When inserting or deleting elements, you can modify the pointers of the front and rear nodes without copying or moving. However, it is still necessary to partially traverse the pointer of the linked list to move to the position indicated by the subscript.
Only the operations at both ends of the linked list - add(), addfirst(), removelast() or remove() on iterator() can save the movement of the pointer.
Apache commons has a treenodelist, which is a binary tree that can quickly move the pointer to place.
1.3 CopyOnWriteArrayList
Concurrent optimized ArrayList. Based on the immutable object strategy, when modifying, first copy an array snapshot to modify, and then make the internal pointer point to the new array.
Because modifications to snapshots are not visible to read operations, they are not mutually exclusive between reads, nor are they mutually exclusive between reads and writes. Only writes and writes should be locked and mutually exclusive. However, the cost of copying snapshots is expensive, which is typically suitable for the scenario of reading more and writing less.
Although the addifabsent (E) method is added, it will traverse the array to check whether the elements already exist, and the performance is unimaginable.
1.4 regret
No matter which implementation, the subscripts contain (E), indexof (E) and remove (E) returned by value need to traverse all elements for comparison, and the performance can be imagined that it will not be very good.
There is no SortedList sorted by element value.
Apart from copyonwritearraylist, there are no other thread safe and concurrent optimization implementations, such as concurrentlinkedlist. When making do with the equivalent classes in set and queue, some list specific methods, such as get (I), will be missing. If the update frequency is high or the array is large, you still have to use collections. Synchronizedlist (list) and use the same lock for all operations to ensure thread safety.
2.Map
2.1 HashMap
For the hash bucket array implemented by the entry [] array, the index of the array can be obtained by taking the size of the modular bucket array with the hash value of the key.
When inserting elements, if two keys fall in the same bucket (for example, hash values 1 and 17 belong to the first hash bucket after modulo 16), we call it hash conflict.
JDK adopts the linked list method. The entry implements multiple entries with a next attribute and is stored in a one-way linked list. When searching for a key with a hash value of 17, first locate the hash bucket, then traverse all elements in the bucket through the linked list, and compare their hash values one by one.
In jdk8, the default threshold of 8 is added. When the entry in a bucket exceeds the threshold, it is stored in a red black tree instead of a one-way linked list to speed up the search of keys.
Of course, it's better to have only one element in the bucket without comparison. Therefore, by default, when the number of entries reaches 75% of the number of buckets, the hash conflict is serious. The bucket array will be doubled and all original entries will be reassigned. The expansion cost is not low, so it's best to have an estimate.
The modulo and operation (hash & (arraylength-1)) will be faster, so the size of the array is always the nth power of 2. If you give an initial value, such as 17, it will be converted to 32. By default, the initial value when the element is first put in is 16.
Iterator () traverses along the hash bucket array, which seems to be out of order.
2.2 LinkedHashMap
Expand HashMap and add a two-way linked list to each entry, which is known as the data structure occupying the most memory.
When iterator () is supported, it is sorted according to the insertion order of entries (if the accessorder property is set to true, all read and write accesses are sorted).
When inserting, the entry adds itself to the front of the header entry. If all read and write accesses have to be sorted, the before / after of the front and back entries should be spliced together to delete themselves in the linked list, so the read operation is also thread unsafe at this time.
2.3 TreeMap
It is realized by red black tree, which is also called self balanced binary tree:
对于任一节点而言,其到叶节点的每一条路径都包含相同数目的黑结点。
According to the above regulations, the number of layers of the tree will not be too far, so that the complexity of all operations will not exceed o (LGN), but also make the insertion and modification complex left-hand and right-hand rotation to maintain the balance of the tree.
When iterator () is supported, it can be sorted by key value, which can be sorted in ascending order according to the key implementing the comparable interface, or controlled by the incoming comparator. It is conceivable that the cost of inserting / deleting elements in the tree must be greater than that of HashMap.
SortedMap interface is supported, such as firstkey(), lastkey() to obtain the maximum and minimum key, or sub (fromkey, tokey), tailmap (fromkey) to cut a section of the map.
2.4 EnumMap
The principle of enummap is that if you want to pass an enumeration class into the constructor, it will build an array as large as all the values of the enumeration. Press enum Ordinal() subscript to access the array. Good performance and memory usage.
The drawback is that the map interface needs to be implemented, and the key in V get (object key) is object rather than generic K. therefore, for security reasons, enummap must first judge the type of key every time it accesses, and a high sampling hit frequency is recorded in JMC.
2.5 ConcurrentHashMap
Concurrent optimized HashMap.
In the classic design in jdk5, there are 16 write locks by default (more can be set), which effectively disperses the probability of blocking. The data structure is segment [], one lock for each segment. The hash bucket array is in the segment. The key first calculates which segment it is in, and then calculates which hash bucket it is in.
There is no read lock, because the put / remove action is an atomic action (for example, the whole process of put is an assignment operation to the array element / entry pointer), and the read operation will not see an intermediate state of the update action.
However, in jdk8, the design of segment [] was abandoned and changed to well-designed, locking only when it is needed.
Support the concurrentmap interface, such as putifabsent (key, value) and the opposite replace (key, value) and, and implement the replace (key, oldvalue, newvalue) of CAS.
2.6 ConcurrentSkipListMap
The SortedMap for concurrent optimization added in JDK6 is implemented in skiplist structure. Concurrent package is selected because it supports CAS based lock free algorithm, while red black tree has no good lock free algorithm.
In principle, it can be imagined as an n-storey building composed of multiple linked lists, in which the elements are sparse to dense, and each element has right and down pointers. Start traversing from the first floor. If the value at the right end is greater than expected, go down one floor and continue.
Typical space for time. Every time you insert, you have to decide which floors to insert, and at the same time, you have to decide whether to build another floor.
Its size () can't be adjusted randomly, and will be counted all over the world.
3.Set
Almost all sets are implemented internally with a map, because the keyset in the map is a set, and the value is a false value. All use the same object.
The features of set also inherit those of the internal map implementation.
HashSet: inside is HashMap.
Linkedhashset: the internal is LinkedHashMap.
TreeSet: internally, it is the sortedset of treemap.
Concurrentskiplistset: internally, it is the sortedset of concurrent optimization of concurrentskiplistmap.
Copyonwritearrayset: internally, it is a concurrent optimized set of copyonwritearraylist. Its addifabsent() method is used to realize element de duplication. As mentioned earlier, the performance of this method is very general.
It seems that a concurrenthashset is missing. There should have been a simple implementation of concurrenthashmap internally, but JDK did not provide it. Jetty simply sealed one, while guava directly used java util. Collections. Newsetfrommap (New concurrenthashmap()) implementation.
4.Queue
Queue is a list in and out at both ends, so it can also be implemented by array or linked list.
4.1 General queue
4.1. 1 LinkedList
Yes, the LinkedList implemented in a two-way linked list is both a list and a queue.
4.1. 2 ArrayDeque
A bidirectional queue implemented as a circular array. The size is a multiple of 2. The default is 16.
In order to support FIFO, that is, pressing elements from the end of the array (fast) and taking elements from the head of the array (super slow), the implementation of ordinary ArrayList can no longer be used, but circular array is used instead.
There are two subscripts at the head and tail of the team: when the element pops up, the subscript at the head of the team increases; When an element is added, the subscript at the end of the queue is incremented. If the element is added to the end of the array space, the element is assigned to the array [0], and the tail subscript points to 0. If the next element is inserted, it is assigned to the array [1], and the tail subscript points to 1. If the subscript at the end of the queue catches up with the head of the queue, it indicates that all space in the array has been used up. Double the capacity of the array.
4.1. 3 PriorityQueue
The priority queue implemented with balanced binary minimum heap is no longer FIFO, but out of the queue according to the comparable interface implemented by the element or the comparison results passed into the comparator. The smaller the value, the higher the priority and the first out of the queue. Note, however, that the return of its iterator () will not be sorted.
Balanced minimum binary heap can be expressed with a simple array, which can be quickly addressed without pointers. The smallest children in queue [0], such as the two children in queue [4], will be in queue [2 * 4 + 1] and queue [2 * (4 + 1)], that is, queue [9] and queue [10].
When joining the queue, insert queue [size], and then compare and adjust the heap binary up.
When leaving the queue, pop up the queue [0], and then take out the queue [size] and binary down to compare and adjust the heap.
The initial size is 11, and the capacity will be automatically expanded by 50% when the space is insufficient.
4.2 thread safe queues
4.2. 1 ConcurrentLinkedQueue/Deque
Unbounded concurrent optimized queue, based on linked list, implements a CAS dependent lock free algorithm.
The structure of concurrentlinkedqueue is a one-way linked list and head / tail pointers. When joining the queue, you need to modify the next pointer of the tail element and modify the tail to point to the newly joined element. The two CAS actions cannot be atomic, so you need a special algorithm.
4.3 thread safe blocking queue
BlockingQueue: firstly, if the queue is empty, it will block there without repeatedly checking whether there is new data. Secondly, the length of the queue is limited to ensure that the speed of producers and consumers will not be too far apart. When the queue is full when entering the queue or empty when leaving the queue, the effects of different functions are shown in the following table:
4.3. 1 ArrayBlockingQueue
The fixed length concurrent optimized BlockingQueue is also implemented based on circular array. There is a public lock and two conditions, notfull and notempty, to manage the blocking state when the queue is full or empty.
4.3. 2 LinkedBlockingQueue/Deque
The optional fixed length concurrent optimized BlockingQueue is implemented based on linked list, so the length can be set to integer MAX_ Value becomes unbounded and waiting.
Using the characteristics of linked list, the two locks of takelock and putlock are separated, and notempty and notfull are used to manage the blocking state when the queue is full or empty.
4.3. 3 PriorityBlockingQueue
Unbounded PriorityQueue is also a binary heap based on array storage (see above). A public lock realizes thread safety. Because it is unbounded, the capacity will be automatically expanded when there is not enough space, so it will not be locked when entering the column, and it will only be locked when leaving the column empty.
4.3. 4 DelayQueue
There is a PriorityQueue inside, which is also unbounded and locked only when it is out of line. A common lock for thread safety. The element needs to implement the delayed interface. Each call needs to return the current time before the trigger time. Less than 0 indicates that it is triggered.
When pull(), peek() will be used to check the elements of the queue head to see if the trigger time is reached. The scheduledthreadpoolexecutor uses a similar structure.
4.4 synchronization queue
The synchronousqueue synchronization queue itself has no capacity. When putting elements, for example, wait for the elements to be taken away by the consumer of another thread and then return. Use it in the JDK thread pool.
JDK7 also adds a transfer (E) function on the basis of ordinary thread safe BlockingQueue, which has the same effect as synchronous queue.