Detailed explanation and implementation of current limiting in high concurrency system

2019-09-28 • Java

When developing high concurrency systems, there are three sharp tools to protect the system: caching, degradation and current limiting. Combined with the author's experience, this paper introduces the related concepts, algorithms and conventional implementation methods of current limiting.

cache

Caching is easy to understand. In large high concurrency systems, if there is no cache, the database will explode every minute, and the system will be paralyzed instantly. Using cache can not only improve system access speed and concurrent access, but also an effective way to protect database and system. Large websites are generally mainly "read", and the use of cache is easy to think of. In large "write" systems, caching often plays a very important role. For example, batch write of accumulated data, cache queue (production and consumption) in memory, and HBase write data mechanism are also used to improve system throughput or realize system protection measures. Even message oriented middleware can be considered as a distributed data cache.

Demotion

Service degradation refers to the strategic degradation of some services and pages according to the current business conditions and traffic when the server pressure increases sharply, so as to release server resources and ensure the normal operation of core tasks. Degradation will often specify different levels, face different exception levels and perform different processing. According to the service mode: you can refuse the service, delay the service, or sometimes provide random service. According to the service scope: you can cut off a function or some modules. In short, service degradation requires different degradation strategies according to different business needs. The main purpose is that although the service is damaged, it is always better than nothing.

Current limiting

Current limiting can be considered as a kind of service degradation. Current limiting is to limit the input and output flow of the system to achieve the purpose of protecting the system. Generally speaking, the throughput of the system can be measured. In order to ensure the stable operation of the system, once the threshold that needs to be limited is reached, it is necessary to limit the flow and take some measures to complete the purpose of limiting the flow. For example: delay processing, reject processing, or partial reject processing, etc.

Current limiting algorithm

Common current limiting algorithms include counter, leaky bucket and token bucket algorithms.

Counter

The counter is the simplest algorithm. For example, a service can only process up to 100 requests per second. We can set a 1-second sliding window. There are 10 grids in the window. Each grid moves every 100 milliseconds. Each move needs to record the number of current service requests. The number of times it needs to be saved in memory 10 times. It can be implemented with the data structure LinkedList. Each time the grid moves, judge whether the difference between the current access times and the last one in the LinkedList exceeds 100. If it exceeds 100, you need to limit the current.

Obviously, the more the grid of the sliding window is divided, the smoother the rolling of the sliding window and the more accurate the current limiting statistics will be.

The example code is as follows:

Leaky bucket algorithm

Leaky bucket algorithm, or leaky bucket, is a very common current limiting algorithm, which can be used to realize traffic shaping and traffic policy. A diagram on Wikipedia is posted to help you understand:

The main concepts of leaky bucket algorithm are as follows:

A leaky bucket with a fixed capacity flows out water droplets at a constant fixed rate;

If the bucket is empty, no water droplets need to flow out;

The water can flow into the leakage bucket at any rate;

If the inflow drops exceed the capacity of the bucket, the inflow drops overflow (are discarded), and the capacity of the leaky bucket remains unchanged.

Leaky bucket algorithm is easy to implement. Queue can be used in stand-alone system. (TPL dataflow in net can better deal with similar problems. You can find the relevant introduction here). Message middleware or redis are optional schemes in distributed environment.

Token Bucket

The token bucket algorithm is a bucket for storing tokens with a fixed capacity. Tokens are added to the bucket at a fixed rate. The token bucket algorithm can be basically described by the following concepts:

The token will be put into the token bucket at a fixed rate. For example, 10 per second. A maximum of B tokens are stored in the bucket. When the bucket is full, the newly added tokens are discarded or rejected. When an n-byte packet arrives, n tokens will be deleted from the bucket, and then the packet will be sent to the network. If there are less than n tokens in the bucket, the token will not be deleted and the packet will be throttled (either discarded or waiting in the buffer).

As shown below:

The token algorithm controls the output rate according to the rate of putting tokens, that is, the rate of tonetwork in the figure above. Tonetwork can be understood as a message handler that executes a certain business or calls an RPC.

Comparison between leaky bucket and token bucket

The token bucket can control and adjust the data processing rate at runtime to handle the burst traffic at a certain time. Increasing the frequency of releasing tokens can improve the overall data processing speed, and increase or slow down the issuing speed of tokens and reduce the overall data processing speed by increasing the number of tokens obtained each time. The leaky bucket can't, because its outflow rate is fixed and the program processing speed is also fixed.

Overall, the token bucket algorithm is better, but the implementation is more complex.

Implementation of current limiting algorithm

Guava

Guava is a Google open source project, which contains several core libraries widely relied on by Google's Java projects. Ratelimiter provides the implementation of token bucket algorithm: smooth burst and smooth warming up.

1. Normal rate:

Create a current limiter and set the number of tokens placed per second: 2. The returned ratelimiter object can ensure that no more than 2 tokens will be given within 1 second, and it is placed at a fixed rate. Achieve the effect of smooth output

The execution result of the above code is shown in the figure below, which is basically one data in 0.5 seconds. The data can only be processed after getting the token to achieve the smooth effect of outputting data or calling the interface. The return value of acquire () is the time to wait for the token. If you need to process some burst traffic, you can set a threshold value for this return value and process it according to different situations, such as expiration and discarding.

2. Sudden flow:

Burst traffic can be more or less burst. Let's start with an example of multiple bursts. It is also the traffic in the above example, with 2 data tokens per second. The following code uses the acquire method to specify parameters.

System. out. println(r.acquire(2)); System. out. println(r.acquire(1)); System. out. println(r.acquire(1)); System. out. println(r.acquire(1));

A similar output is obtained as follows.

If you want to process more data at a time, you need more tokens. The code first obtains two tokens, so the next token is not obtained after 0.5 seconds, or after 1 second, and then returns to the normal speed. This is an example of many bursts. If there is no traffic in a burst, the following code:

System. out. println(r.acquire(1)); Thread. sleep(2000); System. out. println(r.acquire(1)); System. out. println(r.acquire(1)); System. out. println(r.acquire(1));

Similar results are obtained as follows:

After waiting for two seconds, three tokens are accumulated in the token bucket, which can be obtained continuously without taking time. In fact, dealing with bursts means that the output is constant per unit time. Both methods use the subclass smoothburst of ratelimiter. Another subclass is smoothwarmingup, which provides a buffered traffic output scheme.

The output result is shown in the figure below. Since the buffer time is set to 3 seconds, the token bucket will not give a message in 0.5 seconds at the beginning, but form a smooth linear decline slope. The frequency is higher and higher. It reaches the originally set frequency within 3 seconds, and then it will be output at a fixed frequency. The three times of the red coil in the figure add up to about 3 seconds. This function is suitable for the scene where the system needs some time to "warm up" just after startup.

Nginx

For the current limiting of nginx access layer, nginx has two modules: connection number current limiting module NGX_ http_ limit_ conn_ Module and leaky bucket algorithm NGX_ http_ limit_ req_ module。

1. ngx_ http_ limit_ conn_ module

We often encounter this situation, such as abnormal server traffic, excessive load and so on. For large traffic malicious attack access, it will bring waste of bandwidth, server pressure and affect business. It is often considered to limit the number of connections and concurrency of the same IP. ngx_ http_ limit_ conn_ Module module to implement the requirement. The module can limit the number of connections per key value according to the defined key, just like the number of connections from an IP source. Not all connections will be counted by the module. Only those connections where requests are being processed (the header information of these requests has been fully read in) will be counted.

We can be in nginx_ The following configuration implementation restrictions are added to http {} of conf:

Then add the following code to server {}:

Then we use AB test to simulate concurrent requests: ab - N 5 - C 5 http://10.23.22.239/index.html

The following results are obtained. It is obvious that concurrency is limited, and 503 is displayed if the threshold is exceeded:

In addition, whether to configure the concurrency limit for a single IP or the domain name, the configuration is similar to the client IP.

2. ngx_ http_ limit_ req_ module

We used NGX above_ http_ limit_ conn_ Module module to limit the number of connections. So what should I do to limit the number of requests? This requires NGX_ http_ limit_ req_ Module module, which can limit the frequency of request processing through defined key values. In particular, the request processing frequency from a single IP address can be limited. The limiting method is to use the funnel algorithm to fix the number of requests processed per second and delay too many requests. If the request frequency exceeds the limit domain configuration value, the request processing will be delayed or discarded, so all requests are processed at the defined frequency.

Configure in http {}

#The area name is one, the size is 10m, and the average processing frequency of requests cannot exceed once per second.

Configure in server {}

The above settings define that the request processing of each IP can only be limited to 1 per second. And the server can cache 5 requests for each IP. If 5 requests are operated, the requests will be discarded.

Simulate 10 consecutive client accesses using ab test: ab - N 10 - C 10 http://10.23.22.239/index.html

As shown in the figure below, the number of passes is set to 5. There are 10 requests in total. The first request is processed immediately. The 2nd-6th are stored in barrels. Since the bucket is full and nodelay is not set, the remaining four requests are discarded.

summary

The above is all about the detailed explanation and implementation of current limiting of high concurrency system in this paper. I hope it will be helpful to you. Interested friends can continue to refer to this site: detailed explanation of Java Web applications using current limiting to process a large number of concurrent requests, detailed explanation of producers and consumers realized by BlockingQueue of Java Concurrent learning, 5 code examples of concurrent processing skills, etc. if you have any questions, you can leave a message at any time, and the editor will reply to you in time. Thank you for your support!

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.

THE END

Java

二维码

23 design patterns (1) Java singleton pattern

< <上一篇

Android imitation wechat chat time format display function

下一篇>>

搜索内容

Detailed explanation and implementation of current limiting in high concurrency system

热门文章