How to build high performance mysql index
< p id = "_1" > this article focuses on how to build a high-performance MySQL index, from which you can learn how to analyze whether an index is a good index and how to build a good index.
A common misunderstanding of an index is to create an index for each column, such as the index created below:
The T table has three columns and an index is created for each column. The person who created the Index added a separate index for each column in order to quickly access any column in the table. Creating indexes on multiple columns usually does not improve MySQL query performance. Although the index merging strategy is introduced after MySQL 5.0, which can merge multiple single column indexes into one index, it is not always effective. Creating multiple indexes at the same time will also increase the cost of data insertion. When inserting data, you need to maintain the write operation of multiple indexes at the same time.
Index calculation
Look at the following SQL statement:
Even if we build an index on the ID column of the student table, the above SQL statement cannot use the index. An index field in an SQL statement cannot be part of an expression or an argument to a function.
Index length and selectivity
Try not to use an index on a very long column, otherwise the index will occupy a lot of space, and it means a slower speed when inserting and updating data. Therefore, using UUID column as index is not a good choice. From the previous article, we can know that in order to speed up data access, the index needs to be resident in memory. If we take 64 bit UUID as the index, the size of the index is also increasing sharply with the increase of the amount of data in the table. At the same time, because UUIDs are not sequential, it is necessary to find the insertion location of the current index from the root node when inserting data. If the index size in the same node reaches the upper limit, it will also lead to node splitting and further reduce the insertion speed.
Another thing to consider when creating an index is the selectivity of the index. Generally, we will use columns with high selectivity as the index, but this may not always be the case. The next section will introduce how to weigh the selectivity of the index.
Create high performance index
Select the correct index order
When selecting the index order, there is a principle: put the column with the highest index selectivity on the left, and the index order should be consistent with the query index order, taking into account the needs of sorting and grouping. In a multi column B-tree, the order of indexes in multiple columns means that the indexes are sorted first according to the leftmost column, followed by the second column. Therefore, both the where statement and the order by statement need to meet this order as much as possible in order to better use the index.
Index selectivity
High column selectivity means that useless data can be filtered out more through this column. For example, if the self increasing ID is built into the index, its selectivity is the highest, because useless data will be filtered out, leaving only one valid data. We can simply measure the selectivity of a column in the following ways:
The larger the data above, the higher the selectivity of columna. This approach provides a measure of average selectivity, but it is not necessarily effective and needs to be analyzed on a case by case basis.
Prefix index
When a particularly long column is encountered, but the index must be established, you can consider establishing a prefix index. Prefix index refers to taking the first n characters of a column as the index. The method of creating prefix index is as follows:
The above statement creates the first five characters of columna as a prefix index. Prefix index is an effective way to make the index smaller and faster. However, there is one drawback: MySQL cannot use the prefix index for order by and group by, nor can it use the prefix index for overlay scanning.
Clustered index and non clustered index
Clustered index
Cluster index represents a data storage method, which means that the b-tree index and data rows are saved in the same structure. That is, when the cluster index is established, the actual data rows are stored on the leaf node of the index. This also determines that each table can only have one clustered index.
The cluster index organizes data as shown in the following figure:
Advantages of clustered indexing:
1. Relevant data are stored together, and the number of IO is reduced during retrieval. 2. Data access is faster. 3. Queries scanned with overlay index can directly use the primary key value in the node
When using the above advantages, clustered indexes also have some disadvantages:
1. The clustered index gathers data together, which limits the insertion speed. The insertion speed depends on the order of the primary key. 2. The cost will become higher when updating the index. 3. It needs to be searched twice when accessing the secondary index
Non clustered index
A non clustered index is usually called a secondary index. The difference from a clustered index is that the leaf node of a non clustered index stores a row pointer or a primary key value of the data. In this way, when searching for data, first locate the primary key value (or row pointer) on the leaf node, and then find the corresponding data in the cluster index through the primary key value. From this, we can see that queries with non clustered indexes need to go through the index twice. The following figure shows a non clustered index:
This index is the headset index in InnoDB, and the leaf node stores the index and primary key. For MyISAM, leaf nodes store indexes and row pointers.
Overlay index
If an index contains or overwrites the values of all the fields to be queried, it is called an overwrite index. Overlay index can greatly improve the efficiency of query. If we only query the index in our query without going back to the table, it should be the best.
Usually, we use the explain keyword to view the execution plan of a query statement. Through the execution plan, we can understand the details of the query. If the index is overwritten, we will see the information of "using index" in the extra column of the execution plan. In query statements, we generally hope that the statements in the where condition can be overwritten as much as possible, and the order should be consistent with the index. Another point to note is that MySQL cannot use the like operation in the index, which will invalidate the subsequent indexes.