Analysis of caching and delayed loading mechanism in Hibernate framework of Java
The difference between hibernate L1 cache and L2 cache is between application and physical data source. Its function is to reduce the frequency of application access to physical data source, so as to improve the running performance of application. The data in the cache is a copy of the data in the physical data source. The application reads and writes data from the cache at run time. The data in the cache and the physical data source will be synchronized at a specific time or event. The cache medium is generally memory, so the read and write speed is very fast. However, if the amount of data stored in the cache is very large, the hard disk will also be used as the cache medium. The implementation of cache should not only consider the storage medium, but also consider the concurrent access of cache management and the life cycle of cache data. Hibernate cache includes session cache and sessionfactory cache. Sessionfactory cache can be divided into two types: built-in cache and external cache. The session cache is built-in and cannot be unloaded. It is also called hibernate's first level cache. The implementation of the built-in cache of sessionfactory is similar to that of session. The former refers to the data contained in some collection attributes of sessionfactory objects, and the latter refers to the data contained in some collection attributes of session. The mapping metadata and predefined SQL statements are stored in the built-in cache of sessionfactory. The mapping metadata is a copy of the data in the mapping file, and the predefined SQL statements are derived from the mapping metadata during hibernate initialization. The built-in cache of sessionfactory is read only, and the application cannot modify the mapping metadata and predefined SQL statements in the cache, Therefore, sessionfactory does not need to synchronize the built-in cache with the mapping file. The external cache of sessionfactory is a configurable plug-in. By default, sessionfactory does not enable this plug-in. The data of external cache is the copy of database data, and the media of external cache can be memory or hard disk. The external cache of sessionfactory is also called hibernate L2 cache. Hibernate's two-level caches are located in the persistence layer and store copies of database data. What is the difference between them? In order to understand the difference between the two, we need to deeply understand the two characteristics of the cache in the persistence layer: the scope of the cache and the concurrent access strategy of the cache. The cache scope of the persistence layer determines the cache life cycle and who can access it. The scope of cache is divided into three categories. 1. Transaction scope: the cache can only be accessed by the current transaction. The life cycle of the cache depends on the life cycle of the transaction. When the transaction ends, the cache ends its life cycle. In this range, the cached media is memory. Transactions can be database transactions or application transactions. Each transaction has its own cache. The data in the cache usually takes the form of interrelated objects. 2 process scope: the cache is shared by all transactions in the process. These transactions may access the cache concurrently, so the necessary transaction isolation mechanism must be adopted for the cache. The life cycle of the cache depends on the life cycle of the process. When the process ends, the cache ends its life cycle. Process wide cache may store a large amount of data, so the storage medium can be memory or hard disk. The data in the cache can be in the form of interrelated objects or loose data of objects. The form of loose object data is somewhat similar to the serialized data of objects, but the algorithm for decomposing objects into loose objects is faster than the algorithm for object serialization. 3 cluster scope: in a cluster environment, the cache is shared by the processes of one machine or multiple machines. The data in the cache is copied to each process node in the cluster environment. The consistency of the data in the cache is ensured through remote communication between processes. The data in the cache is usually in the form of loose data of objects. For most applications, we should carefully consider whether to use cluster wide cache, because the access speed is not necessarily much faster than that of directly accessing database data. The persistence layer can provide a wide range of caches. If the corresponding data is not found in the transaction wide cache, you can also query in the process wide or cluster wide cache. If it is still not found, you can only query in the database. Transaction wide cache is the first level cache of persistence layer, which is usually required; Process wide or cluster wide caching is the second level cache of the persistence layer and is usually optional. Concurrent access strategy of the cache of the persistence layer when multiple concurrent transactions access the same data of the cache of the persistence layer at the same time, it will cause concurrency problems, and necessary transaction isolation measures must be taken. Concurrency problems occur in process wide or cluster wide caches, that is, the second level cache. Therefore, the following four types of concurrent access policies can be set, and each policy corresponds to a transaction isolation level. Transactional: applicable only in managed environments. It provides a repeatable read transaction isolation level. For data that is often read but rarely modified, this isolation type can be used because it can prevent concurrency problems such as dirty reads and non repeatable reads. Read / write type: provides read committed transaction isolation level. Applicable only in non clustered environments. For data that is often read but rarely modified, this isolation type can be used because it can prevent concurrency problems such as dirty reads. Non strict read / write type: the consistency between the cache and the data in the database is not guaranteed. If it is possible for two transactions to access the same data in the cache at the same time, a short data expiration time must be configured for the data to avoid dirty reading as much as possible. This concurrent access strategy can be used for data that is rarely modified and allows occasional dirty reading. Read only: this concurrent access policy can be used for data that will never be modified, such as reference data. Transactional concurrent access policy has the highest isolation level for transactions and the lowest isolation level for read-only. The higher the transaction isolation level, the lower the concurrency performance. What kind of data is suitable for storage in the second level cache? 1. Rarely modified data 2. Not very important data. Occasional concurrent data is allowed 3. Data that will not be accessed concurrently 4. Reference data is not suitable for data stored in the second level cache? 1. Frequently modified data 2. Financial data must not be concurrent 3. Data shared with other applications. The second level cache of Hibernate is as mentioned earlier. Hibernate provides two-level cache, and the first level is the session cache. Since the life cycle of a session object usually corresponds to a database transaction or an application transaction, its cache is a transaction wide cache. The first level cache is required, not allowed, and in fact cannot be dismounted. In the first level cache, each instance of the persistent class has a unique oid. The second level cache is a pluggable cache plug-in, which is managed by sessionfactory. Since the life cycle of the sessionfactory object corresponds to the entire process of the application, the second level cache is a process wide or cluster wide cache. The loose data of the objects stored in this cache. The second level objects may have concurrency problems, so it is necessary to adopt an appropriate concurrency access policy, which provides a transaction isolation level for the cached data. The cache adapter is used to integrate the specific cache implementation software with hibernate. The second level cache is optional and can be configured at the granularity of each class or collection. The general process of Hibernate's secondary cache strategy is as follows: 1) when querying conditions, a select * from table is always issued_ name where …. (select all fields) this SQL statement queries the database to obtain all data objects at one time. 2) put all the obtained data objects into the second level cache according to their IDs. 3) when hibernate accesses the data object according to the ID, first look it up from the session level-1 cache; No, if L2 cache is configured, query from L2 cache; If not, query the database again and put the results into the cache according to the ID. 4) when deleting, updating and adding data, update the cache at the same time. Hibernate's secondary cache strategy is a cache strategy for ID query, which has no effect on conditional query. To this end, hibernate provides a query cache for conditional queries. The process of Hibernate's query caching strategy is as follows: 1) hibernate first forms a query key based on these information. The query key includes the general information of conditional query request: SQL, parameters required by SQL, record range (starting position rowstart, maximum number of records maxRows), etc. 2) hibernate finds the corresponding result list in the query cache according to the query key. If it exists, the result list is returned; If it does not exist, query the database, obtain the result list, and put the whole result list into the query cache according to the query key. 3) the SQL in the query key involves some table names. If any data in these tables is modified, deleted or added, these related query keys must be emptied from the cache.
Hibernate deferred loading mechanism:
Delayed loading mechanism is proposed to avoid some unnecessary performance overhead. The so-called delayed loading is to really perform data loading when data is really needed. Hibernate provides delayed loading of entity objects and collections. In addition, hibernate 3 also provides delayed loading of attributes. Next, we will introduce the details of these kinds of delayed loading.
A. Delayed loading of entity objects:
If you want to use deferred loading on entity objects, you must configure them in the mapping configuration file of the entity, as shown below:
Set the lazy attribute of class to true to enable the deferred loading feature of the entity. If we run the following code:
(1)
(2)
When it runs to (1), hibernate does not initiate a query on the data. If we observe the memory snapshot of the user object through some debugging tools (such as the debug tool of jbuilder2005), we will be surprised to find that the object of type user $enhancerbycglib $$bede8986 may be returned, and its attribute is null. What's the matter? Remember I talked about session The load () method will return the proxy class object of the entity object. The object type returned here is the proxy class object of the user object. In Hibernate, cglib is used to dynamically construct a proxy class object of the target object, and all properties and methods of the target object are included in the proxy class object, and all properties are assigned null. From the memory snapshot displayed by the debugger, we can see that the real user object at this time is cglib $callback contained in the proxy object_ 0. In the target attribute, when the code runs to (2), the user. Getname () method is called. At this time, cglib $callback is actually called through the callback mechanism given by cglib_ 0. Getname() method. When calling this method, hibernate will first check cglib $callback_ 0. Whether the target attribute is null. If it is not empty, call the getname method of the target object. If it is empty, a database query will be initiated to generate SQL statements like: select * from user where id ='1 '; To query the data, construct the target object, and assign it to cglib $callback_ 0. Target attribute.
In this way, hibernate implements the delayed loading of entities through an intermediate proxy object. Only when the user really initiates the action of obtaining the attributes of the entity object will the database query operation be initiated. Therefore, the delayed loading of entities is completed through the intermediate agent class, so there is only session The load () method uses entities to delay loading, because only session The load () method will return the proxy class object of the entity class.
B. Deferred loading of collection types:
In Hibernate's delayed loading mechanism, the application of collection type is of the greatest significance, because it may greatly improve the performance. Therefore, hibernate has made a lot of efforts, including the independent implementation of JDK collection. In one to many association, the set set defined to accommodate associated objects is not Java util. Set type or its subtype, but net sf. hibernate. collection. Set type. By using the implementation of custom collection classes, hibernate implements the delayed loading of collection types. In order to use deferred loading for collection types, we must configure the association section of our entity class as follows:
Turn on the deferred loading feature of the collection type by setting the lazy attribute of the < set > element to true. Let's look at the following code:
(1)
(2)
When the program is executed to (1), it will not initiate a query on the associated data to load the associated data. Only when it runs to (2), the real data reading operation will start. At this time, hibernate will find the qualified entity objects according to the qualified data index in the cache.
Here we introduce a new concept - data index. Next, we will first take a look at what is data index. When caching a collection type in Hibernate, it is cached in two parts. First, cache the ID list of all entities in the collection, and then cache the entity objects. The ID list of these entity objects is the so-called data index. When searching the data index, if the corresponding data index is not found, a select SQL will be executed to obtain the qualified data, construct the entity object set and data index, then return the entity object set, and include the entity object and data index into the hibernate cache. On the other hand, if the corresponding data index is found, take out the ID list from the data index, and then find the corresponding entity in the cache according to the ID. if it is found, it will be returned from the cache. If it is not found, it will initiate a select SQL query. Here we see another problem that may have an impact on performance, which is the cache strategy of collection type. If we configure the collection type as follows:
Here we apply the < cache usage = "read only" / > configuration. If this strategy is adopted to configure the collection type, hibernate will only cache the data index, not the entity objects in the collection. With the above configuration, we run the following code:
Run this code and you will get an output similar to the following:
We can see that when the query is executed for the second time, two query operations on the address table are executed. Why is this? This is because after the entity is loaded for the first time, only the collection data index is cached according to the configuration of the collection type cache policy, but the entity objects in the collection are not cached. Therefore, when the entity is loaded again for the second time, hibernate finds the data index of the corresponding entity, but according to the data index, the corresponding entity cannot be found in the cache, Therefore, hibernate initiates two select SQL query operations based on the found data index, which causes a waste of performance. How can we avoid this situation? We must also specify caching policies for entities in the collection type, so we need to configure the collection type as follows:
At this time, hibernate will also cache the entities in the collection type. If you run the above code again according to this configuration, you will get the following output:
At this time, there will be no SQL statements to query according to the data index, because the entity objects stored in the collection type can be obtained directly from the cache.
C. Attribute deferred load:
In Hibernate 3, a new feature - delayed loading of attributes is introduced, which provides a powerful tool for obtaining high-performance queries. When we talked about reading big data objects earlier, there is a resume field in the user object, which is a Java sql. CLOB type contains the user's resume information. When we load the object, we have to load this field every time, whether we really need it or not, and the reading of this big data object itself will bring great performance overhead. In hibernate2, we can only decompose the user class through the granularity subdivision of surface performance we talked about earlier, To solve this problem (please refer to the discussion in that section), but in Hibernate 3, we can use the attribute delay loading mechanism to enable us to read the field data only when we really need to operate this field. Therefore, we must configure our entity class as follows:
Set true to the lazy attribute of < property > element to enable the delayed loading of attributes. In Hibernate 3, in order to realize the delayed loading of attributes, class enhancer is used to strengthen the class file of entity class. Through the enhancement of enhancer, the callback mechanism logic of cglib is added to entity class. Here we can see the delayed loading of attributes, It is also implemented through cglib. Cglib is an open source project of Apache. This class library can manipulate the bytecode of Java classes and dynamically construct qualified class objects according to the bytecode. According to the above configuration, we run the following code:
(1)
(2)
When executed to (1), the following SQL statements will be generated:
At this time, hibernate will retrieve the field data corresponding to all non deferred loading attributes in the user entity. When it is executed to (2), an SQL statement similar to the following will be generated:
At this time, a real read operation of the resume field data will be initiated.