Performance optimized database 3- primary key generation and paging query
Topic 1: generation of primary key ID
Database based auto_ The increment self increment ID can be used as a distributed ID. You need a separate MySQL instance to generate IDs.
Disadvantages: when there is a large amount of data, the database itself is a bottleneck, and a single point of DB is prone to a single point of failure. 2 Database multi master mode
That is, the database is made into a master-slave mode cluster. In order to prevent a master node from hanging up, a dual master mode cluster is used. However, the self incrementing IDs of the two MySQL instances start from 1. What if they are repeated?
Solution: set the starting value and step size. For example, db1: initial value 1, step 2 DB2: initial value 2, step 2 3. Section mode
The number segment mode can be understood as obtaining self increasing IDs from the database in batches, such as obtaining 1000 IDs for specific services at a time.
CREATE TABLE id_generator (
id int(10) NOT NULL,max_id bigint(20) NOT NULL COMMENT '当前最大id',step int(20) NOT NULL COMMENT '号段的布长',biz_type int(20) NOT NULL COMMENT '业务类型',version int(20) NOT NULL COMMENT '版本号',PRIMARY KEY (`id`)
)
7. Implement the incr command of redis Snow flake algorithm
Snowflake ID composition structure: positive digit (1bit) + timestamp (41bit) + machine ID (5bit) + Data Center (5bit) + self increment (12bit)
public class SNowFlakeShortUrl {
/**
* 起始的时间戳
*/
private final static long START_TIMESTAMP = 1480166465631L;
/**
* 每一部分占用的位数
*/
private final static long SEQUENCE_BIT = 12; //序列号占用的位数
private final static long MACHINE_BIT = 5; //机器标识占用的位数
private final static long DATA_CENTER_BIT = 5; //数据中心占用的位数
/**
* 每一部分的最大值
*/
private final static long MAX_SEQUENCE = -1L ^ (-1L << SEQUENCE_BIT);
private final static long MAX_MACHINE_NUM = -1L ^ (-1L << MACHINE_BIT);
private final static long MAX_DATA_CENTER_NUM = -1L ^ (-1L << DATA_CENTER_BIT);
/**
* 每一部分向左的位移
*/
private final static long MACHINE_LEFT = SEQUENCE_BIT;
private final static long DATA_CENTER_LEFT = SEQUENCE_BIT + MACHINE_BIT;
private final static long TIMESTAMP_LEFT = DATA_CENTER_LEFT + DATA_CENTER_BIT;
private long dataCenterId; //数据中心
private long machineId; //机器标识
private long sequence = 0L; //序列号
private long lastTimeStamp = -1L; //上一次时间戳
private long getNextMill() {
long mill = getNewTimeStamp();
while (mill <= lastTimeStamp) {
mill = getNewTimeStamp();
}
return mill;
}
private long getNewTimeStamp() {
return System.currentTimeMillis();
}
/**
* 根据指定的数据中心ID和机器标志ID生成指定的序列号
*
* @param dataCenterId 数据中心ID
* @param machineId 机器标志ID
*/
public SNowFlakeShortUrl(long dataCenterId,long machineId) {
if (dataCenterId > MAX_DATA_CENTER_NUM || dataCenterId < 0) {
throw new IllegalArgumentException("DtaCenterId can't be greater than MAX_DATA_CENTER_NUM or less than 0!");
}
if (machineId > MAX_MACHINE_NUM || machineId < 0) {
throw new IllegalArgumentException("MachineId can't be greater than MAX_MACHINE_NUM or less than 0!");
}
this.dataCenterId = dataCenterId;
this.machineId = machineId;
}
/**
* 产生下一个ID
*
* @return
*/
public synchronized long nextId() {
long currTimeStamp = getNewTimeStamp();
if (currTimeStamp < lastTimeStamp) {
throw new RuntimeException("Clock moved backwards. Refusing to generate id");
}
if (currTimeStamp == lastTimeStamp) {
//相同毫秒内,序列号自增
sequence = (sequence + 1) & MAX_SEQUENCE;
//同一毫秒的序列数已经达到最大
if (sequence == 0L) {
currTimeStamp = getNextMill();
}
} else {
//不同毫秒内,序列号置为0
sequence = 0L;
}
lastTimeStamp = currTimeStamp;
return (currTimeStamp - START_TIMESTAMP) << TIMESTAMP_LEFT //时间戳部分
| dataCenterId << DATA_CENTER_LEFT //数据中心部分
| machineId << MACHINE_LEFT //机器标识部分
| sequence; //序列号部分
}
public static void main(String[] args) {
SNowFlakeShortUrl sNowFlake = new SNowFlakeShortUrl(2,3);
for (int i = 0; i < (1 << 4); i++) {
//10进制
System.out.println(sNowFlake.nextId());
}
}
}
Uid generator is developed by Baidu technology department. The project GitHub address is https://github.com/baidu/uid-generator 9. Leaf
Leaf is developed by meituan, GitHub address: https://github.com/Meituan-Dianping/Leaf 10. Tinyid
Tinyid was developed by Didi, GitHub address: https://github.com/didi/tinyid 。
Topic 2: paging query
What problems are easy to encounter in paging query?
Question 1: using the paging plug-in
The principle of paging plug-in is to add a bracket to your SQL, and then use count (*) to calculate the number of rows.
Solution: if your SQL is simple, it's OK. If it's complex connection query, it will be very slow. The paging plug-in can be rewritten and discarded according to the business situation.
Problem 2: large number of paging problems, such as limit 100000,20
SELECT * FROM table WHERE id >= (SELECT id FROM table LIMIT 1000000,1) LIMIT 10;
Question 3: can you query with any combination of query criteria?
Solution: in this case, many fields are involved and cannot be indexed. Search engines can be used to solve this problem.
2.1 cross database paging query
2.1. 1 global view method
The original SQL: select * from t_ Change order by ID limit 20,10 to select * from t_ order order by id limit 0,30
Change limit x, y to limit 0, x + y
Disadvantages: the performance of this method is getting lower and lower as the page is turned. In fact, if you do not rewrite the SQL, the query pressure on the database side is almost the same, because both limit 05000 and limit 4900100 need to scan 4900 rows to get data. The biggest impact of limit 05000 is that the memory consumption and network consumption of the client become larger.
Sharding JDBC uses this method for paging query, but it uses streaming processing and priority queue disaggregation to eliminate the pressure of client memory consumption, but the network consumption can not be eliminated.
2.1. 2. Business compromise method - Page skipping query is prohibited
2.1. 3 business compromise - allow inaccurate data
For example, limit 600100 is divided into two libraries. Assuming that the data is uniform, take 50 from each library and take them from 300. Then combine the data to get the 100 data we want to get. Change to limit 300,50.
2.1. 4 secondary query method
Premise: the data of multiple databases are shared equally, and the remainder is taken for segmentation. It cannot be segmented (the data of a certain period of time is in one database), and there cannot be a missing piece of data in a certain database
Sample data:
If we want to query order by ID limit 5,4 (the result should be 6, 7, 8, 9)
1. First SQL rewrite
order by id limit 2,4
The query results are as follows:
3. The second SQL rewrite
Library 1: where id bewteen 5,14
Library 2: where id bewteen 5,11
The query results are as follows:
Then sort in the result set and take out the minimum 4 values. So the result is 5, 6, 7, 8
advantage:
Disadvantages: