Understanding JTS–reference

Part I-An introduction to transactions

If you look at any introductory article or book on J2EE,you'll find only a small portion of the material deVoted to the Java Transaction Service (JTS) or the Java Transaction API (JTA). This is not because JTS is an unimportant or optional portion of J2EE -- quite the opposite. JTS gets less press than EJB technology because the services it provides to the application are largely transparent -- many developers are not even aware of where transactions begin and end in their application. The obscurity of JTS is in some sense due to its own success: because it hides the details of transaction management so effectively,we don't hear or say very much about it. However,you probably want to understand what it's doing on your behalf behind the scenes.

It would not be an exaggeration to say that without transactions,writing reliable distributed applications would be almost impossible. Transactions allow us to modify the persistent state of an application in a controlled manner,so that our applications can be made robust to all sorts of system failures,including system crashes,network failures,power failures,and even natural disasters. Transactions are one of the basic building blocks needed to build fault-tolerant,highly reliable,and highly available applications.

Imagine you are transferring money from one account to another. Each account balance is represented by a row in a database table. If you want to transfer funds from account A to account B,you will probably execute some sql code that looks like this: = transferAmount) THEN UPDATE Accounts SET accountBalance = accountBalance - transferAmount WHERE accountId = aId; UPDATE Accounts SET accountBalance = accountBalance + transferAmount WHERE accountId = bId; INSERT INTO AccountJournal (accountId,amount) VALUES (aId,-transferAmount); INSERT INTO AccountJournal (accountId,amount) VALUES (bId,transferAmount); ELSE FAIL "Insufficient funds in account"; END IF So far,this code looks fairly straightforward. If A has sufficient funds on hand,money is subtracted from one account and added to another account. But what happens in the case of a power failure or system crash? The rows representing account A and account B are not likely to be stored in the same disk block,which means that more than one disk IO will be required to complete the transfer. What if the system fails after the first one is written,but before the second? Then money might have left A's account,but not shown up in B's account (neither A nor B will like this),or maybe money will show up in B's account,but not be debited from A's account (the bank won't like this.) Or what if the accounts are properly updated,but the account journal is not? Then the activities on A and B's monthly bank statement won't be consistent with their account balances. Not only is it impossible to write multiple data blocks to disk simultaneously,but writing every data block to disk when any part of it changes would be bad for system performance. Deferring disk writes to a more opportune time can greatly improve application throughput,but it needs to be done in a manner that doesn't compromise data integrity. Even in the absence of system failures,there is another risk worth discussing in the above code -- concurrency. What if A has $100 in his account,but initiates two transfers of $100 to two different accounts at the exact same time? If our timing is unlucky,without a suitable locking mechanism both transfers Could succeed,leaving A with a negative balance. These scenarios are quite plausible,and it is reasonable to expect enterprise data systems to cope with them. We expect banks to correctly maintain account records in the face of fires,floods,disk failures,and system failures. Fault tolerance can be provided by redundancy -- redundant disks,computers,and even data centers -- but it is transactions that make it practical to build fault-tolerant software applications. Transactions provide a framework for enforcing data consistency and integrity in the face of system or component failures. So what is a transaction,anyway? Before we define this term,first we will define the concept of application state. An application's state encompasses all of the in-memory and on-disk data items that affect the application's operation -- everything the application "knows." Application state may be stored in memory,in files,or in a database. In the event of a system failure -- for example,if the application,network,or computer system crashes -- we want to ensure that when the system is restarted,the application's state can be restored. We can now define a transaction as a related collection of operations on the application state,which has the properties of atomicity, consistency,isolation,and durability. These properties are collectively referred to as ACID properties. Atomicity means that either all of the transactions' operations are applied to the application state,or none of them are applied; the transaction is an indivisible unit of work. Consistency means that the transaction represents a correct transformation of the application state -- that any integrity constraints implicit in the application are not violated by the transaction. In practice,the notion of consistency is application-specific. For example,in an accounting application,consistency would include the invariant that the sum of all asset accounts equal the sum of all liability accounts. We will return to this requirement when we discuss transaction demarcation in Part 3 of this series. Isolation means that the effects of one transaction do not affect other transactions that are executing concurrently; from the perspective of a transaction,it appears that transactions execute sequentially rather than in parallel. In database systems,isolation is generally implemented using a locking mechanism. The isolation requirement is sometimes relaxed for certain transactions to yield better application performance. Durability means that once a transaction successfully completes,changes to the application state will survive failures. What do we mean by "survive failures?" What constitutes a survivable failure? This depends on the system,and a well-designed system will explicitly identify the faults from which it can recover. The transactional database running on my desktop workstation is robust to system crashes and power failures,but not to my office building burning down. A bank would likely not only have redundant disks,networks,and systems in its data center,but perhaps also have redundant data centers in separate cities connected by redundant communication links to allow for recovery from serious failures such as natural disasters. Data systems for the military might have even more stringent fault-tolerance requirements. A typical transaction has several participants -- the application,the transaction processing monitor (TPM),and one or more resource managers (RMs). The RMs store the application state and are most often databases,but Could also be message queue servers (in a J2EE application,these would be JMS providers) or other transactional resources. The TPM coordinates the activities of the RMs to ensure the all-or-nothing nature of the transaction. A transaction begins when the application asks the container or transaction monitor to start a new transaction. As the application accesses varIoUs RMs,they are enlisted in the transaction. The RM must associate any changes to the application state with the transaction requesting the changes. A transaction ends when one of two things happens: the transaction is committed by the application,or the transaction is rolled back either by the application or because one of the RMs Failed. If the transaction successfully commits,changes associated with that transaction will be written to persistent storage and made visible to new transactions. If it is rolled back,all changes made by that transaction will be discarded; it will be as if the transaction never happened at all. Transactional RMs achieve durability with acceptable performance by summarizing the results of multiple transactions in a single transaction log. The transaction log is stored as a sequential disk file (or sometimes in a raw partition) and will generally only be written to,not read from,except in the case of rollback or recovery. In our bank account example,the balances associated with accounts A and B would be updated in memory,and the new and old balances would be written to the transaction log. Writing an update record to the transaction log requires less total data to be written to disk (only the data that has changed needs to be written,instead of the whole disk block) and fewer disk seeks (because all the changes can be contained in sequential disk blocks in the log.) Further,changes associated with multiple concurrent transactions can be combined into a single write to the transaction log,meaning that we can process multiple transactions per disk write,instead of requiring several disk writes per transaction. Later,the RM will update the actual disk blocks corresponding to the changed data. If the system fails,the first thing it does upon restart is to reapply the effects of any committed transactions that are present in the log but whose data blocks have not yet been updated. In this way,the log guarantees durability across failures,and also enables us to reduce the number of disk IO operations we perform,or at least defer them to a time when they will have a lesser impact on system performance. Many transactions involve only a single RM -- usually a database. In this case,the RM generally does most of the work to commit or roll back the transaction. (Nearly all transactional RMs have their own transaction manager built in,which can handle local transactions -- transactions involving only that RM.) However,if a transaction involves two or more RMs -- maybe two separate databases,or a database and a JMS queue,or two separate JMS providers -- we want to make sure that the all-or-nothing semantics apply not only within the RM,but across all the RMs in the transaction. In this case,the TPM will orchestrate a two-phase commit. In a two-phase commit,the TPM first sends a "Prepare" message to each RM,asking if it is ready and able to commit the transaction; if it receives an affirmative reply from all RMs,it marks the transaction as committed in its own transaction log,and then instructs all the RMs to commit the transaction. If an RM fails,upon restart it will ask the TPM about the status of any transactions that were pending at the time of the failure,and either commit them or roll them back. A societal analogy for the two-phase commit is the wedding ceremony -- the clergyman or judge first asks each party "Do you take this man/woman to be your husband/wife?" If both parties say yes,they are both declared to be married; otherwise,both remain unmarried. In no case does one party end up married while the other one doesn't,regardless of who says "I do" first. You may have observed that transactions offer many of the same features to application data that synchronized blocks do for in-memory data -- guarantees about atomicity,visibility of changes,and apparent ordering. But while synchronization is primarily a concurrency control mechanism,transactions are primarily an exception-handling mechanism. In a world where disks don't fail,systems and software don't crash,and power is 100 percent reliable,we wouldn't need transactions. Transactions perform the role in enterprise application

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>