Living in the matrix with bytecode regulation
Original address: https://www.infoq.com/articles/Living-Matrix-Bytecode-Manipulation
You are probably all too familiar with the following sequence: You input a . java file into a Java compiler,(likely using javac or a build tool like ANT,Maven or Gradle),the compiler grinds away,and finally emits one or more . class files.
figure 1: What is Java bytecode?
If you run the build from the command line with verbose enabled,you can see the output as it parses your file until finally it prints your . class file.
The generated . class file contains the bytecode,essentially the instruction set for the Java virtual machine (JVM),and is what gets loaded by the Java runtime class loader when a program executes.
In this article we will investigate Java bytecode and how to manipulate it,and why anyone would ever want to do so.
Bytecode-manipulation frameworks
Some of the more popular frameworks for manipulating bytecode include:
This article focuses on Javassist and ASM.
Why should you care about manipulating bytecode?
Many common Java libraries such as Spring and Hibernate,as well as most JVM languages and even your IDEs,use bytecode-manipulation frameworks. For that reason,and because it’s really quite fun,you might find bytecode manipulation a valuable skillset to have. You can use bytecode manipulation to perform many tasks that would be difficult or impossible to do otherwise,and once you learn it,the sky's the limit.
One important use case is program analysis. For example,the popular FindBugs bug-locator tool uses ASM under the hood to analyze your bytecode and locate bug patterns. Some software shops have code-complexity rules such as a maximum number of if/else statements in a method or a maximum method size. Static analysis tools analyze your bytecode to determine the code complexity.
Another common use is class generation. For example,ORM frameworks typically use proxies based on your class deFinitions. Or consider security applications that provide Syntax for adding authorization annotations. Such use cases lend themselves nicely to bytecode manipulation.
JVM languages such as Scala,Groovy,and Grails all use a bytecode-manipulation framework.
Consider a situation where you need to transform library classes without having the source code,a task routinely performed by Java profilers. For example,at New Relic,bytecode instrumentation is used to time method executions.
With bytecode manipulation,you can optimize or obfuscate your code,or you can introduce functionality such as adding strategic logging to an application. This article will focus on a logging example,which will provide the basic tools for using these bytecode manipulation frameworks.
Our example
Sue is in charge of ATM coding for a bank. She has a new requirement: add key data to the logs for some designated important actions.
Here is a simplified bank-transactions class. It allows a user to log in with a username and password,does some processing,withdraws a sum of money,and then prints out “transactions completed.” The important actions are the login and withdrawal.
To simplify the coding,Sue would like to create an @ImportantLog annotation for those method calls,containing input parameters that represent the indexes of the method arguments she wants to record. With that,she can annotate her login and withdraw methods.
For login,Sue wants to record the account ID and the username so her fields will be set to “1” and “2”,(she doesn’t want to display the password!) For the withdraw method,her fields are “0” and “1” because she wants to output the first two fields: account ID and the amount of money to remove. Her audit log ideally will contain something like this:
To hook this up,Sue is going to use a Java agent. Introduced in JDK 1.5,Java agents allow you to modify the bytes that comprise the classes in a running JVM,without requiring any source code.
Without an agent,the normal execution flow of Sue’s program is:
When you introduce a Java agent,a few more things happen — but let’s first see what’s required to create an agent. An agent must contain a class with a premain method. It must be packaged as a JAR file with a properly constructed manifest that contains a Premain-Class entry. There is a switch that must be set on launch to point to the JAR path,which makes the JVM aware of the agent.
Inside premain,register a Transformer that captures the bytes of every class as it is loaded,makes any desired modifications,and returns the modified bytes. In Sue’s example, Transformercaptures BankTransaction,which is where she makes her modifications and returns the modified bytes. Those are the bytes that are loaded by the class loader,and which the main method will execute to perform its original functionality in addition to Sue’s required augmented logging.
When the agent class is loaded,its premain method is invoked before the application mainmethod.
figure 2: Process with Java agent.
It’s best to look at an example.
The Agent class doesn’t implement any interface,but it must contain a premain method,as follows:
The Transformer class contains a transform method,whose signature accepts a ClassLoader,class name, Class object of the class being redefined, ProtectionDomain defining permissions,and the original bytes of the class. Returning null from the transform method tells the runtime that no changes have been made to that class.
To modify the class bytes,supply your bytecode manipulation logic in transform and return the modified bytes.
Javassist
A subproject of JBoss,Javassist (short for “Java Programming Assistant”) consists of a high-level object-based API and a lower-level one that is closer to the bytecode. The more object-based one enjoys more community activity and is the focus of this article. For a complete tutorial,refer to the .
In Javassist,the basic unit of class representation is the CtClass (“compile time class”). The classes that comprise your program are stored in a ClassPool,essentially a container forCtClass instances.
The ClassPool implementation uses a HashMap,in which the key is the name of the class and the value is the corresponding CtClass object.
A normal Java class contains fields,constructors,and methods. The corresponding CtClassrepresents those as CtField, CtConstructor,and CtMethod. To locate a CtClass,you can grab it by name from the ClassPool,then grab any method from the CtClass and apply your modifications.
Figure 3.
CtMethod contains lines of code for the associated method. We can insert code at the beginning of the method using the insertBefore command. The great thing about Javassist is that you write pure Java,albeit with one caveat: the Java must be implemented as quoted strings. But most people would agree that’s much better than having to deal with bytecode! (Although,if you happen to like coding directly in bytecode,stay tuned for the ASM section.) The JVM includes a bytecode verifier to guard against invalid bytecode. If your Javassist-coded Java is not valid,the bytecode verifier will reject it at runtime.
Similar to insertBefore,there's an insertAfter to insert code at the end of a method. You can also insert code in the middle of a method by using insertAt or add a catch statement withaddCatch.
Let's kick off your IDE and code your logging feature. We start with an Agent (containingpremain) and our ClassTransformer.
<span class="token keyword">package com<span class="token punctuation">.example<span class="token punctuation">.spring2gx<span class="token punctuation">.agent<span class="token punctuation">;
<span class="token keyword">import java<span class="token punctuation">.lang<span class="token punctuation">.instrument<span class="token punctuation">.ClassFileTransformer<span class="token punctuation">;
<span class="token keyword">import java<span class="token punctuation">.lang<span class="token punctuation">.instrument<span class="token punctuation">.IllegalClassFormatException<span class="token punctuation">;
<span class="token keyword">import java<span class="token punctuation">.security<span class="token punctuation">.ProtectionDomain<span class="token punctuation">;<span class="token keyword">public <span class="token keyword">class <span class="token class-name">ImportantLogClassTransformer
<span class="token keyword">implements <span class="token class-name">ClassFileTransformer <span class="token punctuation">{<span class="token keyword">public <span class="token keyword">byte<span class="token punctuation">[<span class="token punctuation">] <span class="token function">transform<span class="token punctuation">(ClassLoader loader<span class="token punctuation">,String className<span class="token punctuation">,Class <span class="token class-name">classBeingRedefined<span class="token punctuation">,ProtectionDomain protectionDomain<span class="token punctuation">,<span class="token keyword">byte<span class="token punctuation">[<span class="token punctuation">] classfileBuffer<span class="token punctuation">) <span class="token keyword">throws IllegalClassFormatException <span class="token punctuation">{
<span class="token comment">// manipulate the bytes here
<span class="token keyword">return modified_bytes<span class="token punctuation">;
<span class="token punctuation">}
To add audit logging,first implement transform to convert the bytes of the class to a CtClassobject. Then,you can iterate its methods and capture ones with the @ImportantLogin annotation on them,grab the input parameter indexes to log,and insert that code at the beginning of the method.
<span class="token comment">// get important method parameter indexes
List parameterIndexes <span class="token operator">= <span class="token function">getParamIndexes<span class="token punctuation">(annotation<span class="token punctuation">)<span class="token punctuation">;
<span class="token comment">// add logging statement to beginning of the method
currentMethod<span class="token punctuation">.<span class="token function">insertBefore<span class="token punctuation">(
<span class="token function">createJavaString<span class="token punctuation">(currentMethod<span class="token punctuation">,className<span class="token punctuation">,parameterIndexes<span class="token punctuation">)<span class="token punctuation">)<span class="token punctuation">;
<span class="token punctuation">}
<span class="token punctuation">}
<span class="token keyword">return cclass<span class="token punctuation">.<span class="token function">toBytecode<span class="token punctuation">(<span class="token punctuation">)<span class="token punctuation">;
<span class="token punctuation">}
<span class="token keyword">return null<span class="token punctuation">;
<span class="token punctuation">}
Javassist annotations can be declared as “invisible” or “visible”. Invisible annotations,which are only visible at class loading time and compile time,are declared by passing in theRententionPolicy. CLASS argument to the annotation. Visible annotations (RententionPolicy.RUNTIME) are loaded and visible at run time. For this example,you only need the attributes at compile time,so make them invisible.
The getAnnotation method scans for your @ImportantLog annotation and returns null if it doesn’t find the annotation.
With the annotation in hand,you can retrieve the parameter indexes. Using Javassist’sArrayMemberValue,the member value fields are returned as a String array,which you can iterate to obtain the field indexes you had embedded in the annotation.
You are finally in a position to insert your log statement in createJavaString.
Your implementation creates a StringBuilder,appending some preamble followed by the required method name and class name. One thing to note is that if you're inserting multiple Java statements,you need to surround them with squiggly brackets (see lines 4 and 26).
(Brackets are not required for just a single statement.)
That pretty much covers the code for adding audit logging using Javassist. In retrospect,the positives are:
The negatives are:
ASM
ASM began life as a Ph.D. project and was open-sourced in 2002. It is actively updated,and supports Java 8 since the 5. x version. ASM consists of an event-based library and an object-based one,similar in behavior respectively to SAX and DOM XML parsers. This article will focus on the event-based library. Complete documentation can be found .
A Java class contains many components,including a superclass,interfaces,attributes,fields,and methods. With ASM,you can think of each of these as events; you parse the class by providing a ClassVisitor implementation,and as the parser encounters each of those components,a corresponding “visitor” event-handler method is called on the ClassVisitor(always in this sequence).
Let’s get a feel for the process by passing Sue’s BankTransaction (defined at the beginning of the article) into a ClassReader for parsing.
Again,start with the Agent premain:
Then pass the output bytes to a no-op ClassWriter to put the parsed bytes back together in the byte array,producing a rehydrated BankTransaction that as expected is virtually identical to our original class.
figure 4.
<span class="token keyword">import java<span class="token punctuation">.lang<span class="token punctuation">.instrument<span class="token punctuation">.ClassFileTransformer<span class="token punctuation">;
<span class="token keyword">import java<span class="token punctuation">.lang<span class="token punctuation">.instrument<span class="token punctuation">.IllegalClassFormatException<span class="token punctuation">;
<span class="token keyword">import java<span class="token punctuation">.security<span class="token punctuation">.ProtectionDomain<span class="token punctuation">;<span class="token keyword">public <span class="token keyword">class <span class="token class-name">ImportantLogClassTransformer <span class="token keyword">implements <span class="token class-name">ClassFileTransformer <span class="token punctuation">{
<span class="token keyword">public <span class="token keyword">byte<span class="token punctuation">[<span class="token punctuation">] <span class="token function">transform<span class="token punctuation">(ClassLoader loader<span class="token punctuation">,<span class="token keyword">byte<span class="token punctuation">[<span class="token punctuation">] classfileBuffer<span class="token punctuation">) <span class="token keyword">throws IllegalClassFormatException <span class="token punctuation">{
ClassReader cr <span class="token operator">= <span class="token keyword">new <span class="token class-name">ClassReader<span class="token punctuation">(classfileBuffer<span class="token punctuation">)<span class="token punctuation">;
ClassWriter cw <span class="token operator">= <span class="token keyword">new <span class="token class-name">ClassWriter<span class="token punctuation">(cr<span class="token punctuation">,ClassWriter<span class="token punctuation">.COMPUTE_FRAMES<span class="token punctuation">)<span class="token punctuation">;
cr<span class="token punctuation">.<span class="token function">accept<span class="token punctuation">(cw<span class="token punctuation">,<span class="token number">0<span class="token punctuation">)<span class="token punctuation">;
<span class="token keyword">return cw<span class="token punctuation">.<span class="token function">toByteArray<span class="token punctuation">(<span class="token punctuation">)<span class="token punctuation">;
<span class="token punctuation">}
<span class="token punctuation">}
Now let’s modify our ClassWriter to do something a little more useful by adding a ClassVisitor(named LogMethodClassVisitor) to call our event handler methods,such as visitField orvisitMethod,as the corresponding components are encountered during parsing.
figure 5.
For your logging requirement,you want to check each method for the indicative annotation and add any specified logging. You only need to overwrite ClassVisitor visitMethod to return aMethodVisitor that supplies your implementation. Just like there are several components of a class,there are several components of a method,corresponding to the method attributes,annotations,and compiled code. ASM’s MethodVisitor provides hooks for visiting every opcode of the method,so you can get pretty granular in your modifications.
Again,the event handlers are always called in the same predefined sequence,so you always kNow all of the attributes and annotations on the method before you have to actually visit the code. (Incidentally,you can chain together multiple instances of MethodVisitor,just like you can chain multiple instances of ClassVisitor.) So in your visitMethod,you’re going to hook in thePrintMessageMethodVisitor,overriding visitAnnotations to capture your annotations and insert any required logging code.
Your PrintMessageMethodVisitor overrides two methods. First comes visitAnnotation,so you can check the method for your @ImportantLog annotation. If present,you need to extract the field indexes from that field’s property. When visitCode executes,the presence of the annotation has already been determined and so it can add the specified logging. ThevisitAnnotation code hooks in an AnnotationVisitor that exposes the field arguments on the@ImportantLog annotation.
Now,let's look at the visitCode method. First,it must check if the AnnotationVisitor flagged the annotation as present. If so,then add your bytecode.
<span class="token punctuation">}
This is the scary part of ASM — you actually have to write bytecode,so that’s something new to learn. You have to kNow about the stack,local variables,etc. It’s a fairly simple language,but if you just want to hack around,you can actually get the existing bytecode pretty easily with javap:
I recommend writing the code you need in a Java test class,compiling that,and running it though javap -c to see the exact bytecode. In the code sample above,everything in blue is actually the bytecode. On each line,you get a one-byte opcode followed by zero or more arguments. You will need to determine those arguments for the target code,and they can usually be extracted by doing a javap-c -v on the original class (-v for verbose,which displays the constant pool).
I encourage you to look at the ,which defines every opcode. There are operations likeload and store (which move data between your operand stack and your local variables),overloaded for each parameter type. For example, ILOAD moves an integer value from the stack into a local variable field whereas LLOAD does the same for a long value.
There are also operations like invokeVirtual, invokeSpecial, invokeStatic,and the recently added invokeDynamic,for invoking standard instance methods, constructors,static methods,and dynamic methods in dynamically typed JVM languages,respectively. There are also operations for creating new classes using the new operator,or to duplicate the top operand on the stack.
In sum,the positives of ASM are:
The really only one negative,but it’s a big one: you’re writing bytecode,so you have to understand what's going on under the hood and as a result developers tend to take some time to ramp up.
Lessons learned
Bytecode manipulation can make life easier. You can find bugs,add logging (as discussed),obfuscate source code,perform preprocessing like Spring or Hibernate,or even write your own language compiler. You can restrict your API calls,analyze code to see if multiple threads are accessing a collection,lazy-load data from the database,and find differences between JARs by inspecting them.
So I encourage you to make a bytecode-manipulation framework your friend. Someday,one might save your job.