On the best practice of spring batch in large enterprises

In large enterprises, not all operations can be processed through the interactive interface due to complex business, large amount of data, different data formats and complex data interaction formats. Some operations need to read a large number of data regularly, and then carry out a series of subsequent processing. This process is called "batch processing".

Batch applications usually have the following characteristics:

What is spring batch

Spring batch is a lightweight and comprehensive batch framework. It is designed for large enterprises to help develop robust batch applications. Spring batch provides many necessary reusable functions for processing large quantities of data, such as log tracking, transaction management, job execution statistics, restart job and resource management. At the same time, it also provides optimization and fragmentation technology for high-performance batch tasks.

Its core functions include:

The author's department belongs to the CRM Department of a large foreign financial company. In our daily work, we often need to develop some batch applications and have rich experience in using spring batch. Recently, the author specially summarized these experiences.

Using spring batch 3.0 and spring boot

When using spring batch, it is recommended to use the latest version of spring batch 3.0. Compared to spring batch2 2. It has made the following improvements:

Support for spring 4 and Java 8 is a major improvement. In this way, you can use the spring boot component introduced by spring 4, thus making a qualitative leap in development efficiency. The introduction of spring batch framework only needs to be in build Add a line of code to gradle:

After enhancing the function of spring batch integration, we can easily integrate with other components of the spring family, call jobs in a variety of ways, and support remote partition operation and remote block processing.

After supporting jobscope, we can inject the context information of the current job instance into the object at any time. As long as we set the scope of the bean as job scope, we can use jobparameters, jobexecutioncontext and other information at any time.

Configuration using java config instead of XML

We used to configure jobs and steps in XML before, but we found many problems over time.

We gradually find that the configuration method using pure Java classes is more flexible, it is type safe, and the IDE support is better. The stream syntax used when building a job or step is simpler and easier to understand than XML.

In this example, you can clearly see the configuration of the step, such as the reader / processor / writer component and which listeners are configured.

Using in memory database in local integration test

Spring batch needs database support at runtime, because it needs to establish a set of Schema in the database to store the statistics of job and step operation. In the local integration test, we can use the memory repository provided by spring batch to store the task execution information of spring batch, which not only avoids configuring a database locally, but also speeds up the execution of jobs.

We are building Add dependency on HSQLDB in gradle:

Then add the configuration of datasource in the test class.

And in application Add the configuration to initialize the database in the properties configuration:

Reasonable use of chunk mechanism

Spring batch uses a chunk based mechanism when configuring step. That is, read one piece of data at a time, process another piece of data, accumulate a certain amount, and then hand it to the writer for write operation at one time. This can maximize the write efficiency, and the whole transaction is based on chunk.

When we need to write data to files and databases, we can set the value of chunk appropriately to maximize the writing efficiency. However, in some scenarios, our write operation is actually to call a web service or send a message to a message queue. In these scenarios, we need to set the value of chunk to 1, so that we can process the write in time and will not repeatedly call the service or send messages when retrying due to exceptions in the whole chunk.

Use the listener to monitor the job execution and handle it in time

Spring batch provides a large number of listeners to comprehensively monitor all aspects of job execution.

At the job level, spring batch provides a jobexecutionlistener interface, which supports some additional processing at the beginning or end of a job. At the step level, spring batch provides interfaces such as stepexecutionlistener, chunklistener, itemreadlistener, itemprocesslistener, itemwritelistener, skiplistener, and retrylistener and skiplistener for retry and skip operations.

We usually implement a jobexecutionlistener for each job. In the after job operation, we output the execution information of the job, including the execution time, job parameters, exit code, executed steps and the details of each step. In this way, both development, testing and operation and maintenance personnel know the execution of the whole job like the back of their hands.

If a skip operation occurs in a step, we will also implement a skiplistener for it and record the data entries of the skip for the next step.

There are two ways to implement the listener: one is to inherit from the corresponding interface, such as the jobexecutionlistener interface, and the other is to use annotation. After practice, we think that the annotation method is better, because you need to implement all methods of the interface when using the interface, and you only need to add annotation to the corresponding methods when using annotations.

The following class inherits the interface. We see that we only use the first method, and the second and third methods are not used. But we must provide an empty implementation.

The method of using annotation can be abbreviated as:

Using retry and skip to enhance the robustness of batch work

In the process of processing millions of data, exceptions will inevitably occur. If the whole batch processing is terminated due to an exception, subsequent data cannot be processed. Spring batch has built-in retry and skip The (skip) mechanism helps us easily handle various exceptions. The characteristics of exceptions suitable for retry are that these exceptions may disappear over time, such as the database currently has locks that cannot be written, the web service is currently unavailable, the web service is full, etc. Therefore, we can configure the retry mechanism for these exceptions. Some exceptions should not be configured for retry, such as the occurrence of parsing files Exceptions, etc., because these exceptions will always fail even if retry.

Even if the retry fails many times, there is no need to fail the whole step. You can set the skip option for the specified exception to ensure that subsequent data can be processed continuously. We can also configure the skiplimit option to ensure that the entire job is terminated in time when the number of skip data entries reaches a certain number.

Sometimes we need to do some operations at intervals during each retry, such as extending the retry time and restoring the operation site. Spring batch provides backoffpolicy to achieve the purpose. The following is a step example configured with retry mechanism, skip mechanism and backoffpolicy.

Use a custom Decker to implement job flow

Jobs are not always executed in sequence. We often need to determine the next step according to the output data or execution results of a job. In the past, we used to put some judgments in downstream steps, which may lead to some steps actually running, but we didn't do anything. For example, during the execution of a step, the failed data entries will be recorded in a report, and the next step will judge whether a report has been generated. If a report has been generated, the report will be sent to the specified contact. If not, nothing will be done. In this case, the job execution process can be implemented through the Decker mechanism. In spring batch 3.0, the Decker has been independent from step and is at the same level as step.

In the job configuration, you can use the Decker in this way. In this way, the whole job execution process will be clearer and easier to understand.

Adopt a variety of mechanisms to speed up job execution

Batch processing has a large amount of data, and the execution window is generally small. Therefore, we must speed up the execution of jobs in a variety of ways. Generally, there are four ways to achieve this:

Multithreading in a single step can be implemented with the help of taskexecutor. This situation is suitable for the scenario where the reader and writer are thread safe and stateless. We can also set the number of threads.

The tasklet in the above example needs to implement taskexecutor. Spring batch provides a simple multithreaded taskexecutor for us to use: simpleasynctaskexecutor.

Parallel execution of different steps is easy to implement in spring batch. The following is an example:

In this example, we first execute step 1, then execute flow1 and flow2 in parallel, and finally execute step 3.

Spring batch provides partitionstep to realize parallel processing of the same step in multiple processes. Through partitionstep and partitionhandler, one step can be extended to multiple slave to realize parallel operation.

Remote execution of chunk task is to divide the processor operation of a step into multiple processes, and multiple processes communicate through some middleware (such as message). This method is suitable for the scenario where processor is the bottleneck and reader and writer are not.

epilogue

Spring batch reasonably abstracts batch processing scenarios and encapsulates a large number of practical functions. Using it to develop batch processing applications can achieve twice the result with half the effort. In the process of use, we still need to insist on summarizing some best practices, so as to deliver high-quality maintainable batch applications and meet the stringent requirements of enterprise applications.

The above is the whole content of this article. I hope it will be helpful to your study, and I hope you can support programming tips.

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>