Tddl: distributed data layer from taobao

Taobao has developed the tddl (Taobao distributed data layer) framework according to its own business needs, which is mainly used to solve the access routing in the scenario of sub database and sub table (cooperation between persistence layer and data access layer) and data synchronization between heterogeneous databases. It is a JDBC datasource implementation based on centralized configuration, with functions such as database and table splitting, master / sell, dynamic data source configuration, etc. at present, many large manufacturers are also producing more excellent and community supported DAL layer products, such as hibernate shards and ibatis sh Arding et al. Tddl is located between the database and the persistence layer. It directly deals with the database, as shown in the figure:

Taobao has processed the data in different databases for a long time. The upper system connects multiple databases, and there is a path called dbroute in the middle to access the data uniformly. Dbroute operates multiple databases and integrates data, so that the upper system operates multiple databases like a database. However, with the increase of data volume, there are higher requirements for the division of database tables. For example, when your commodity data reaches the level of 10 billion, no database can be stored, so it is divided into 2, 4, 8, 16, 32... Up to 1024 and 2048. OK, so many data can be stored. How can I query it? At this time, the middleware of data query should be able to undertake this important task. For the upper layer, it must query data like a database, Tddl takes on this task as fast as querying a database (each query is completed in a few milliseconds). Some systems outside also name the middleware with the concept of dal (data access layer). The following figure shows a simple database and table data query strategy:

Main advantages of tddl:

Architecture of tddl

In fact, tddl can be divided into three layers: matrix layer, group layer and atom layer. The matrix layer is used to implement database and table logic, and the bottom layer holds multiple group instances. The group layer and atom together form a dynamic data source. The group layer implements the write separation logic of the master / sell mode of the database, and the bottom layer holds multiple atom instances. Finally, the atom layer (tatomdatasource) realizes the dynamic push of database IP, port, password, connectionproperties and other information, as well as the JBoss data source with atomic data source separation).

The persistence layer only cares about CRUD operations on data sources, and access to multiple data sources should not be concerned by it. In other words, the data source interface that tddl transparently provides to the persistence layer should be unified and "single". As for how the database is divided into databases and tables, the persistence layer does not need to know or write corresponding SQL to implement countermeasures. At this time, some questions about tddl arise. Does tddl need to perform secondary parsing and assembly of SQL? The answer is no parsing, only assembly. Tddl only needs to get the issued SQL from the persistence layer, and then carry out specific SQL expansion according to some database and table conditions, so as to meet the access route operation. In addition to the database and table conditions, tddl also needs to get order by, group by, limit, join and other information, sum, Max, min and other aggregate function information and distinct information. The SQL with these keywords will be performed in the case of single database and multi database, and the semantics are different. Tddl must properly handle the results returned by SQL using these keywords; For tddl row replication, you need to spell SQL again and bring sync_ Version field; No SQL parsing. Because tddl complies with the JDBC specification, it is impossible to expand the interface in the JDBC specification. Therefore, it can only be passed by adding additional character conditions in SQL (that is, hint mode) or ThreadLocal mode. The former makes the SQL too long, the latter is difficult to maintain, and it is not easy to track when debugging, In addition, it is necessary to determine whether one SQL fails after execution or one connection is closed; Tddl also supports hint mode and ThreadLocal mode to transmit these information; Reference link: https://github.com/alibaba/tb_tddl

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>