Deeply understand the refresh and expire refresh mechanisms of guava cache
1、 Thinking and guessing
First, let's look at three time-based ways to clean up or refresh cached data:
Expireafteraccess: when the cache item is not read or written within the specified time period, it will be recycled.
Expireafterwrite: when the cache item is not updated within the specified time period, it will be recycled.
Refreshafterwrite: how long the cache item will be refreshed after the last update operation.
Considering the timeliness, we can use expireafterwrite to invalidate the cache at a specified time after each update, and then reload the cache. Guava cache will strictly limit only one load operation, which will well prevent the avalanche effect caused by a large number of requests penetrating to the back end at the moment of cache failure.
However, by analyzing the source code, guava cache locks only one load operation, and other requests must be blocked and wait for the load operation to complete; Moreover, after the loading is completed, other requesting threads will obtain locks one by one to judge whether the loading has been completed. Each thread must go through a process of "obtaining locks, obtaining values and releasing locks" in turn, so that there will be some loss of performance. Here, because we plan to cache locally for 1 second, frequent expiration, loading, lock waiting and other processes will cause great performance loss.
Therefore, we consider using refreshafterwrite. The characteristic of refreshafterwrite is that in the process of refresh, only one reload operation is strictly limited, while other queries return the old value first, which can effectively reduce waiting and lock contention. Therefore, refreshafterwrite has better performance than expireafterwrite. However, it also has a disadvantage, because it can not strictly ensure that all queries get new values after reaching the specified time. Have you learned about the timing failure of guava cache (or refresh) as the former students know, guava cache does not use additional threads to perform scheduled cleaning and loading, but depends on query requests. Compare the time of the last update during query, and load or refresh if it exceeds the specified time. Therefore, if refreshafterwrite is used, the throughput is very low, such as not running for a long time After a query occurs, the query may get an old value (this old value may come from a long time ago), which will cause problems.
It can be seen that refreshafterwrite and expireafterwrite have their own advantages and disadvantages and use scenarios. Can we find a compromise between refreshafterwrite and expireafterwrite? For example, control the cache to refresh every 1s. If there is no access for more than 2S, the cache will be invalidated. The old value will not be obtained during the next access, but the new value must be loaded. Because guava official documents do not give a detailed explanation, and there is no answer by consulting some online materials, we can only analyze the source code and find the answer. After analysis, when both are used at the same time, the expected effect can be achieved. This is really good news!
2、 Source code analysis
By tracing the source code of the get method of loadingcache, it is found that the following core methods will eventually be called, and the source code is posted below:
com. google. common. cache. LocalCache. Segment. Get method:
For this buffered get method, number 1 is to judge whether there is a survival value, that is, judge whether it expires according to expireafteraccess and expireafterwrite. If it expires, value is null and number 3 is executed. No. 2 refers to whether refresh is required according to refreshafterwrite without expiration. No. 3 needs to be loaded (load instead of reload) because there is no survival value, which may be expired or not at all. From the code, when getting, we judge the expiration first and then the refresh. Therefore, we can set refreshafterwrite to 1s and expireafterwrite to 2S. When access is frequent, refresh will be performed every second, but not after 2S Access, the next access must load a new value.
Let's continue to follow through and see what load and refresh have done respectively to verify the above theory.
Let's take a look at com google. common. cache. LocalCache. Segment. Lockedgetorload method:
This method is a little long, limited to space, and does not post all the code. There are 7 key steps:
1. Obtain the lock;
2. Obtain the valuereference corresponding to the key;
3. Judge whether the cache value is being loaded. If it is being loaded, the load operation will not be carried out (by setting createnewentry to false), and the new value will be obtained later;
4. If it is not loading, judge whether there is a new value (loaded by other requests). If so, return a new value;
5. Prepare loading and set it to loadingvaluereference. Loadingvaluereference will make other requests find that they are losing in step 3;
6. Release the lock;
7. If you really need to load, perform the load operation.
Through analysis, it is found that there will only be one load operation, and other gets will be blocked first, which verifies the previous theory.
Let's take a look at com google. common. cache. LocalCache. Segment. Schedulerefresh method:
1. Judge whether refresh is required and the current state is not loading. If yes, refresh and return a new value.
2. I added step 2 to prepare for the following test. If you need refresh, but another thread is refreshing the value, print and eventually return the old value.
Go deep into the refresh method called in step 1:
1. Insert loadingvaluereference to indicate that the value is being loaded. Other requests judge whether to refresh or return the old value according to this. There is a lock operation in insertloadingvaluereference to ensure that only one refresh penetrates to the back end. Limited to space, it will not be expanded here. However, the range of locking here is smaller than that of loading. In the process of expire - > load, once all gets know about expire, they need to obtain the lock until a new value is obtained. The influence range of blocking will be from expire to load to the new value; In the process of refresh - > reload, once get finds that refresh is needed, it will first judge whether there is loading, then obtain the lock, and then release the lock before reloading. The blocking range is only the new and set operations of a small object of insertloadingvaluereference, which can be almost ignored. Therefore, this is one of the reasons why refresh is more efficient than expire.
2. Perform the refresh operation. Loadasync is not expanded here. It calls the reload method of cacheloader. The reload method supports overloading to realize asynchronous loading, and the current thread returns the old value, which will improve the performance. By default, it calls the load method of cacheloader synchronously.
Here, we know the difference between refresh and expire! Refresh executes reload, and after expire, load will be executed again, the same as during initialization.
3、 Testing and verification
In the source code posted above, you should pay attention to some system out. Println statements, which I added to facilitate subsequent testing and verification. Now let's verify the program of the analysis just now.
Post the source code of the test:
Execution results:
The verification results are consistent with expectations:
1. Before the cache is initialized, client-1 obtains the latest load lock and performs the load operation. During the load process, other clients also arrive and enter the load process, block, wait for client-1 to release the lock, and then obtain the lock in turn. Finally, only load by client-1.
2. When there is no access beyond the time set by refreshafterwrite, refresh is required, and client-5 refreshes. In this process, other clients do not obtain the lock, but directly query the old value, and do not get the new value until after refresh. The transition is smooth.
3. There is no access within the time set by expireafterwrite. When the main thread accesses, the value has expired and needs to be loaded instead of getting the old value.
Reproduced at: https://blog.csdn.net/abc86319253/article/details/53020432