Java – how does spring batch manage transactions (there may be multiple data sources)?
I want to provide some information about data flow in spring batch processing, but I can't find what I'm looking for on the Internet (although there are some useful questions on this website)
I'm trying to establish a standard for using spring batch in our company. We want to know how spring batch behaves when several processors in some steps update data on different data sources
This problem focuses on a blocking process, but provides information about other modes at any time
From what I see (please correct me if I am wrong), when a row is read, it follows the whole process (reader, processor, author) before the next reading (instead of the reader processing all rows in the processing silo, sending them to the processor, etc.)
In my case, several processors read data (in different databases) and update them in process, and finally the writer inserts the data into another database Now, the jobrepository is not linked to a database, but it will be a separate one, making things still a little complicated
The model cannot be changed because the data belongs to multiple business areas
In this case, how is the transaction managed? Is the data submitted only once after processing the complete block? So, is there a two-stage submission management? How to guarantee? What development or configuration steps should be taken to ensure data consistency?
More generally, what is your suggestion in similar situations?
Solution
Spring uses spring core transaction management in batches. Most transaction semantics are arranged around a large number of items, as described in Section 5.1 of the spring batch docs
The transaction behavior of readers and authors depends on their complete contents (such as file system, database, JMS queue, etc.), but if the resources are configured to support transactions, they will be registered automatically The same is true for XA - if you make the resource endpoint compliant with the XA standard, it will use 2 - phase commit
Back to the block transaction, it will set the transaction on a block basis, so if the commit interval is set to 5 in a given tasklet, it will open and close a new transaction (including all resources managed by the transaction manager) for the set number of reads (defined as the commit interval)
However, all these settings are read from a single data source. Does it meet your requirements? I didn't know that spring batch can manage transactions that read data from multiple sources and write processor results to another database in a single transaction (actually, I can't think of anything I can do...)