Sharding JDBC quick start lesson 1
1. General
Shardingsphere is an ecosystem composed of a set of open source distributed database middleware solutions. It consists of sharding JDBC Sharding proxy and sharding sidecar (planned) are three independent products. They all provide standardized data fragmentation, distributed transaction and database governance functions, which can be applied to a variety of application scenarios such as Java isomorphism, heterogeneous language, cloud native and so on.
Shardingsphere is positioned as a relational database middleware, which aims to make full and reasonable use of the computing and storage capacity of relational database in distributed scenarios.
1.1. ShardingSphere-JDBC
Sharding JDBC is positioned as a lightweight Java framework and provides additional services in the JDBC layer of Java. It uses the client to directly connect to the database and provides services in the form of jar package without additional deployment and dependency. It can be understood as an enhanced jdbc driver and is fully compatible with JDBC and various ORM frameworks.
1.2. ShardingSphere-Proxy
Sharding proxy is positioned as a transparent database proxy and provides a server version encapsulating the database binary protocol to support heterogeneous languages. At present, MySQL and PostgreSQL versions are available. It can use any access client compatible with MySQL / PostgreSQL protocol (such as MySQL command client, MySQL workbench, Navicat, etc.) to operate data, which is more friendly to DBAs.
1.3. ShardingSphere-Sidecar(TODO)
Sharding sidecar is positioned as the cloud native database agent of kubernetes, acting as a sidecar for all access to the database. The meshing layer that interacts with the database is provided through the scheme of no center and zero intrusion, that is, database mesh, also known as database grid.
Database mesh focuses on how to organically connect distributed data access applications with databases. It pays more attention to interaction, which effectively combs the interaction between chaotic applications and databases. Using database mesh, the application and database accessing the database will eventually form a huge grid system. The application and database only need to be seated in the grid system. They are the objects governed by the meshing layer.
1.4. Hybrid architecture
Shardingsphere JDBC adopts a decentralized architecture, which is suitable for High-Performance Lightweight OLTP applications developed by Java; Shardingsphere proxy provides static portal and heterogeneous language support, which is suitable for OLAP applications and scenarios of managing and maintaining partitioned databases.
Apache shardingsphere is an ecosystem composed of multiple access terminals. By mixing shardingsphere JDBC and shardingsphere proxy, and using the same registry to uniformly configure the fragmentation strategy, it can flexibly build application systems suitable for various scenarios, making architects more free to adjust the best system architecture suitable for the current business.
2. Concept & function
2.1. Data slicing
In terms of performance, most relational databases use B + tree indexes. When the amount of data exceeds the threshold, the increase of index depth will also increase the IO times of disk access, resulting in the decline of query performance; At the same time, high concurrent access requests also make the centralized database the biggest bottleneck of the system.
In terms of operation and maintenance cost, when the data in a database instance reaches above the threshold, the operation and maintenance pressure on DBA will increase. The time cost of data backup and recovery will become more and more uncontrollable with the amount of data. Generally speaking, the data threshold of a single database instance is within 1TB, which is a reasonable range.
Vertical slice
According to the business splitting method, it is called vertical splitting, also known as vertical splitting. Its core concept is dedicated to special database. Before splitting, a database is composed of multiple data tables, and each table corresponds to different businesses. After splitting, the tables are classified according to business and distributed to different databases, so as to disperse the pressure to different databases. The following figure shows the scheme of vertically slicing user tables and order tables to different databases according to business needs.
Vertical fragmentation often requires adjustments to the architecture and design. Generally speaking, it is too late to respond to the rapid changes in Internet business needs; Moreover, it can not really solve the single point bottleneck. Vertical splitting can alleviate the problems caused by data volume and access volume, but it can not cure them. If the amount of data in the table still exceeds the threshold that a single node can carry after vertical splitting, horizontal splitting is required for further processing.
Horizontal slice
Horizontal splitting is also called horizontal splitting. Compared with vertical fragmentation, it no longer classifies data according to business logic, Instead, the data is distributed to multiple libraries or tables according to certain rules through a certain field (or several fields), and each partition contains only a part of the data. For example, according to the partition of the primary key, the records of the even primary key are placed in the 0 Library (or table), and the records of the odd primary key are placed in the 1 library (or table), as shown in the following figure.
Horizontal slicing theoretically breaks through the bottleneck of single machine data processing, and expands relatively freely. It is a standard solution to divide databases and tables.
target
The main design goal of Apache shardingsphere data sharding module is to make the impact of database and table sharding as transparent as possible and let users use the database cluster after horizontal sharding as much as possible as a database.
2.1. 1. Core concepts
Data node
The smallest unit of data fragmentation. It consists of data source name and data table, for example: DS_ 0.t_ order_ 0
Slice key
The database field used for sharding is the key field to split the database (table) horizontally. Example: if the mantissa of the order primary key in the order table is taken as a module fragment, the order primary key is a fragment field.
If there is no fragment field in SQL, full routing will be executed and the performance is poor.
In addition to the support for single sharding fields, Apache shardingsphere also supports sharding according to multiple fields.
Partition algorithm
The data is segmented through the segmentation algorithm, and segmentation through =, > =, < =, >, <, <, between and in is supported. The slicing algorithm needs to be implemented by the application developer, and the flexibility is very high.
Partition strategy
It includes partition key and partition algorithm. Due to the independence of partition algorithm, it is separated independently. What can really be used for sharding operation is sharding key + sharding algorithm, that is, sharding strategy. At present, five fragmentation strategies are provided.
Row expression
The configuration can be simplified by using expressions. You only need to use ${expression} or $- > {expression} to identify line expressions in the configuration
${begin.. end} represents the range interval
${[Unit1, Unit2, unit_x]} indicates enumeration values
If there are consecutive ${expression} or $- > {expression} expressions in the line expression, the final result of the whole expression will be Cartesian combined according to the result of each sub expression.
For example, ${['online ',' offline ']}_ Table ${1.. 3} will eventually be resolved to online_ table1,online_ table2,online_ table3,offline_ table1,offline_ table2,offline_ table3
Distributed primary key
In the partition rule configuration module, you can configure the primary key generation strategy of each table. By default, snow flake algorithm is used to generate 64bit long integer data.
Snowflake algorithm is a distributed primary key generation algorithm published by twitter. It can ensure the non repetition of primary keys of different processes and the order of primary keys of the same process.
Implementation principle
In the same process, it first guarantees non repetition through time bits. If the time is the same, it is guaranteed through sequence bits. At the same time, because the time bits are monotonically increasing, and if each server roughly synchronizes the time, the generated primary keys can be considered to be generally orderly in the distributed environment, which ensures the efficiency of inserting index fields. For example, the primary key of the InnoDB storage engine of MySQL.
The binary representation of the primary key generated by the snowflake algorithm consists of four parts. From high to low, it is divided into 1 bit symbol bit, 41 bit timestamp bit, 10 bit work process bit and 12 bit serial number bit.
Reserved symbol bit, constant to zero.
The number of milliseconds that can be accommodated by the 41 bit timestamp is the 41st power of 2. The number of milliseconds used in a year is 365 * 24 * 60 * 60 * 1000. According to the calculation, the result is about 69.73 years. The snowflake algorithm of Apache shardingsphere starts at 0:00 on November 1, 2016 and can be used until 2086. I believe it can meet the requirements of most systems.
This flag is unique within the java process. In case of distributed application deployment, ensure that the ID of each working process is different. The default value is 0, which can be set through the property.
This sequence is used to generate different IDS in the same millisecond. If the number generated in this millisecond exceeds 4096 (the 12th power of 2), the generator will wait until the next millisecond to continue the generation.
The detailed structure of snowflake algorithm primary key is shown in the following figure:
2.1. 2. Use specification
The supported SQL types and unsupported SQL types are listed below, so that users can avoid stepping on the pit as much as possible.
Support item
Route to single data node
Routing to multiple data nodes
Item not supported
Routing to multiple data nodes
https://shardingsphere.apache.org/document/current/cn/features/sharding/use-norms/sql/
2.2. Read write separation
Although read-write separation can improve the throughput and availability of the system, it also brings the problem of data inconsistency. This includes the data consistency between multiple master databases and the data consistency between master databases and slave databases. Moreover, read-write separation also brings the same problem as data fragmentation, which will also make the operation and maintenance of the database more complex for application developers and operation and maintenance personnel. The following figure shows the complex topological relationship between the application and the database cluster when the sub database and sub table are used together with read-write separation.
3. Example: horizontal library segmentation
Introducing Maven dependency
perhaps
Don't say much, go on POM xml
OrderEntiry. java
OrderRepository. java
OrderService. java
OrderController. java
Startup class
The most important is application properties
engineering structure
Source code: https://github.com/chengjiansheng/sharding-jdbc-demo
Through access http://localhost:8080/order/save?userId=xxx If you want to insert data into the database, the result is really as expected
4. Write at the end
Configure entry class:
org. apache. shardingsphere. shardingjdbc. spring. boot. SpringBootConfiguration
The document is here:
https://shardingsphere.apache.org/
https://shardingsphere.apache.org/document/legacy/4.x/document/en/manual/sharding-jdbc/
http://shardingsphere.apache.org/elasticjob/
Write at the end:
Although sharding sphere JDBC (sharding JDBC) provides many functions, the most commonly used ones are database and table separation and read-write separation, which are usually used together
https://shardingsphere.apache.org/document/legacy/4.x/document/en/manual/sharding-jdbc/configuration/config-spring-boot/
After splitting databases and tables, there are many restrictions when writing SQL. Many previous operations on a single database and table cannot be used, and the fragment key must be brought with each query, otherwise the whole table will be scanned
If you have to divide the tables, you might as well consider saving the data in elastic search first, and query directly through es. Or go to es first, and then check MySQL through the primary key.
In a word, be careful!