[source code analysis] how are ArrayList and LinkedList implemented? I think you still have a chance!

2020-12-30 • Java

preface

Seriously, among the collection classes most used in Java, list definitely occupies a place. Like map, it is applicable to many scenarios and is very convenient for our daily development. After all, the need to store a list can be seen everywhere. Nevertheless, many students still don't understand the difference between ArrayList and LinkedList in list. It's a pity. In fact, both of them are the basic contents of data structure. This article will start with the basic concepts, analyze the specific source code implementation of the two in Java, find the differences between the two, and finally think about the precautions when using them.

This article will include the following.

Linear table

To study ArrayList and LinkedList, we must first understand what is a linear table. Here is a passage from Baidu Encyclopedia.

You must have seen that linear table is the most basic, simplest and most commonly used data structure in data structure. It arranges the data one by one into a line (possibly logically), so each data in the linear table has only two directions. In the data structure, the array, linked list, stack and queue are linear tables. You can imagine the neat queue.

You may have questions here. If there is a linear table, then there must be a nonlinear table? you 're right. Binary trees and graphs are typical nonlinear structures. Don't be frightened by these fancy pictures. In fact, this article is very simple. I hope students can read it patiently and praise it.

array

Now that you know what a linear table is, it's easy to understand an array. First, an array is an implementation of a linear table. Array is a data structure composed of elements of the same type. An array needs to allocate a continuous section of memory for storage. Note keywords, same type, continuous memory, like this.

Sorry to put the wrong picture, like this.

The above figure can intuitively reflect the storage structure of the array, because the memory address of the array is continuous, the element type is fixed, and all have the characteristics of quickly finding elements at a certain location; At the same time, because the array needs a continuous memory, the length has been fixed during initialization and cannot be changed. ArrayList in Java is essentially an array encapsulation.

Linked list

The linked list is also a linear list. Different from the array, the linked list does not need continuous memory for data storage, but stores the pointer of the next node in each node at the same time. Pay attention to keywords. Each node has a pointer to the next node. So what should this linked list look like? Look at the picture.

Oh, no, it's the wrong picture. That's right.

The above figure shows the storage structure of the linked list. Each node in the figure has a pointer to the next node. This is called one-way linked list; There is also a linked list. There is a pointer to the previous node on each node. This linked list is called a two-way linked list. I won't draw the picture, like this.

It can be found that the linked list does not need continuous memory storage, because the linked list carries out the next or previous node through the node pointer. As long as the head node is found, the following series of nodes can be found. However, the linked list needs o (n) time complexity when looking up or accessing nodes at a certain location. However, the complexity of O (1) can be achieved when inserting data, because only the node pointer needs to be modified.

ArratList

The above introduces the concept of linear table, and gives two practical implementation examples of linear table, namely array and linked list. In the collection class ArrayList of Java, array storage structure is actually used. ArrayList encapsulates array and adds convenient operations such as insertion, acquisition and capacity expansion. Because the bottom layer of ArrayList is an array, access is very fast, but when adding and deleting, the efficiency of adding and deleting is relatively low because the position of the following elements needs to be moved. So how is it implemented? You might as well go deep into the source code.

ArrayList storage structure

Looking at the source code of ArrayList, you can see that it is a simple array for data storage.

/**
 * The array buffer into which the elements of the ArrayList are stored.
 * The capacity of the ArrayList is the length of this array buffer. Any
 * empty ArrayList with elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA
 * will be expanded to DEFAULT_CAPACITY when the first element is added.
 */
transient Object[] elementData; // non-private to simplify nested class access

/**
 * Shared empty array instance used for default sized empty instances. We
 * distinguish this from EMPTY_ELEMENTDATA to kNow how much to inflate when
 * first element is added.
 */
private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {};

/**
 * Default initial capacity.
 */
private static final int DEFAULT_CAPACITY = 10;

From the above comments, we can see that when ArrayList is constructed without parameters, it will share an array with a length of 0, defaulttopology_ EMPTY_ ELEMENTDATA. Only when the first element is added will the capacity be expanded for the first time, which also prevents more memory waste when creating objects.

ArrayList capacity expansion mechanism

We all know that the size of an array cannot be changed once it is determined, so ArrayList can obviously continue to add elements, and its bottom layer is an array. How is it implemented? According to the above ArrayList storage structure and comments, ArrayList shares an array with a length of 0 during initialization. When the first element is added, it will be expanded for the first time. We can imagine that ArrayList will be expanded every time there is not enough space. What is the expansion mechanism?

Starting from the source code, trace the internal implementation of the add () method.

/**
 * Appends the specified element to the end of this list.
 *
 * @param e element to be appended to this list
 * @return <tt>true</tt> (as specified by {@link Collection#add})
 */
public boolean add(E e) {
    ensureCapacityInternal(size + 1);  // Increments modCount!!
    elementData[size++] = e;
    return true;
}
// 开始检查当前插入位置时数组容量是否足够
private void ensureCapacityInternal(int minCapacity) {
    // ArrayList 是否未初始化，未初始化是则初始化 ArrayList ，容量给 10.
    if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
        minCapacity = Math.max(DEFAULT_CAPACITY,minCapacity);
    }
    ensureExplicitCapacity(minCapacity);
}
// 比较插入 index 是否大于当前数组长度，大于就 grow 进行扩容
private void ensureExplicitCapacity(int minCapacity) {
    modCount++;
    // overflow-conscIoUs code
    if (minCapacity - elementData.length > 0)
        grow(minCapacity);
}

/**
 * Increases the capacity to ensure that it can hold at least the
 * number of elements specified by the minimum capacity argument.
 *
 * @param minCapacity the desired minimum capacity
 */
private void grow(int minCapacity) {
    // overflow-conscIoUs code
    int oldCapacity = elementData.length;
    // 扩容规则是当前容量 + 当前容量右移1位。也就是1.5倍。
    int newCapacity = oldCapacity + (oldCapacity >> 1);
    if (newCapacity - minCapacity < 0)
        newCapacity = minCapacity;
    // 是否大于 Int 最大值，也就是容量最大值
    if (newCapacity - MAX_ARRAY_SIZE > 0)
        newCapacity = hugeCapacity(minCapacity);
    // minCapacity is usually close to size,so this is a win:
    // 拷贝元素到扩充后的新的 ArrayList
    elementData = Arrays.copyOf(elementData,newCapacity);
}

Through the source code, it is found that the expansion logic is relatively simple. The specific expansion process is as follows:

ArrayList data addition

In the above analysis, we have seen the specific logic of adding an element during capacity expansion. Because the bottom layer is an array, it is very simple to directly specify the subscript assignment.

public boolean add(E e) {
    ensureCapacityInternal(size + 1);  // Increments modCount!!
    elementData[size++] = e; // 直接赋值
    return true;
}

However, there is another case of adding data, that is, the subscript position to be added is specified when adding data. What's the difference in logic?

/**
 * Inserts the specified element at the specified position in this
 * list. Shifts the element currently at that position (if any) and
 * any subsequent elements to the right (adds one to their indices).
 *
 * @param index index at which the specified element is to be inserted
 * @param element element to be inserted
 * @throws indexoutofboundsexception {@inheritDoc}
 */
public void add(int index,E element) {
    rangeCheckForAdd(index);
    ensureCapacityInternal(size + 1);  // Increments modCount!!
     // 指定下标开始所有元素后移一位
    System.arraycopy(elementData,index,elementData,index + 1,size - index);
    elementData[index] = element;
    size++;
}

It can be found that this new key line is added. Its function is to move the elements starting from the coordinates to be inserted backward by one bit, so as to make room for the specified subscript and put the new elements.

For example, if you want to add data 100 at the position with subscript 3, all elements starting with subscript 3 need to be moved back one bit.

From this, we can also see a disadvantage of ArrayList, which is inefficient when randomly inserting new data.

ArrayList data acquisition

The data subscript obtains the element value in one step without saying more.

public E get(int index) {
    rangeCheck(index);
    return elementData(index);
}
E elementData(int index) {
    return (E) elementData[index];
}

LinkedList

The bottom layer of LinkedList is a linear structure of linked list. In addition to a node object, the linked list also has one or two pointers according to the difference between one-way linked list and two-way linked list. So is LinkedList a single linked list or a two-way linked list?

LinkedList storage structure

We still go deep into the LinkedList source code. We can see that there is no operation in the parameterless structure of LinkedList, but we can find that they are the first and last nodes of the linked list by looking at the variables first and last.

transient int size = 0;
/**
 * Pointer to first node.
 * Invariant: (first == null && last == null) ||
 *            (first.prev == null && first.item != null)
 */
transient Node<E> first;

/**
 * Pointer to last node.
 * Invariant: (first == null && last == null) ||
 *            (last.next == null && last.item != null)
 */
transient Node<E> last;

/**
 * Constructs an empty list.
 */
public LinkedList() {
}

The variables first and last are of node type. Then check the node source code.

private static class Node<E> {
    E item;
    Node<E> next;
    Node<E> prev;

    Node(Node<E> prev,E element,Node<E> next) {
        this.item = element;
        this.next = next;
        this.prev = prev;
    }
}

You can see that this is a typical two-way linked list structure. Item is used to store element values; Next points to the next node and prev points to the previous node.

LinkedList data acquisition

Unlike the array, the linked list is a continuous memory address. The linked list points to the record link path through next and prev, so finding the node at the specified location can only traverse the search, and so can viewing the source code.

public E get(int index) {
    checkElementIndex(index);
    return node(index).item;
}
/**
 * Returns the (non-null) Node at the specified element index.
 */
// 遍历查找 index 位置的节点信息
Node<E> node(int index) {
    // assert isElementIndex(index);
    // 这里判断 index 是在当前链表的前半部分还是后半部分，然后决定是从
    // first 向后查找还是从 last 向前查找。
    if (index < (size >> 1)) {
        Node<E> x = first;
        for (int i = 0; i < index; i++)
            x = x.next;
        return x;
    } else {
        Node<E> x = last;
        for (int i = size - 1; i > index; i--)
            x = x.prev;
        return x;
    }
}

Find the node object at the specified location. In this part, you should note that the search will first determine whether the index is in the first half or the second half of the current linked list, and then decide whether to search backward from first or forward from last. This can increase the search speed. It can also be seen from here that the efficiency of the linked list in finding the elements at the specified position is not high.

LinkedList data addition

Because LinkedList is a linked list, the addition of LinkedList is the addition of data in the linked list. At this time, the operation should be distinguished according to the location to be inserted.

LinkedList data deletion

Still check the source code for analysis. In the source code, if the node is a head node or tail node, it is relatively simple to delete. We mainly look at the operation of deleting the middle node

public E remove(int index) {
    checkElementIndex(index);
    return unlink(node(index));
}
/**
 * Unlinks non-null node x.
 */
E unlink(Node<E> x) {
    // assert x != null;
    final E element = x.item;
    final Node<E> next = x.next;
    final Node<E> prev = x.prev;

    if (prev == null) {
        first = next;
    } else {
        prev.next = next;
        x.prev = null;
    }

    if (next == null) {
        last = prev;
    } else {
        next.prev = prev;
        x.next = null;
    }

    x.item = null;
    size--;
    modCount++;
    return element;
}

The node (index) method is still binary to find the target location, and then delete it. For example, the node to be deleted is called X. the deletion operation is mainly to modify the prev of node x, the next point of node x is the next point of node x, the prev point of node x is the prev point of node x, and finally clear the prev and next points of node X. If it's a little hard to understand, you can see the figure below, which may be more clear.

extend

You think LinkedList is just a list. Other LinkedList not only implements the list interface, but also implements deque, so it appears to be a list, but in fact it is also a queue.

public class LinkedList<E> extends AbstractSequentialList<E>
    implements List<E>,Deque<E>,Cloneable,java.io.Serializable

Experience the first in, first out queue.

Queue<String> queue = new LinkedList<>();
queue.add("a");
queue.add("b");
queue.add("c");
queue.add("d");
System.out.println(queue.poll());
System.out.println(queue.poll());
System.out.println(queue.poll());
System.out.println(queue.poll());
// result：
// a
// b
// c
// d

Students can think about how this queue is implemented. In fact, it's very simple, right? It's first in first out. Deleting the first node when polling is not finished.

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.

THE END

Java

二维码

Millions of AdGuardHomeRules-ADH ad filtering interception rules”

< <上一篇

Java web learning notes (6) JSP

下一篇>>

搜索内容