Original address: http://www.ibm.com/developerworks/cn/java/j-dyn0429/
This article is the first in a new series that will discuss a series of topics I call the dynamics of Java programming. These topics range from the basic structure of the Java binary class file format and runtime metadata access using reflection to modifying and constructing new classes at runtime. The common thread throughout this article is the idea that programming on the Java platform is more dynamic than using a language that directly compiles cost machine code. If you understand these dynamic aspects, you can use Java programming to do things that cannot be done in any other mainstream programming language.
In this article, I will discuss some basic concepts that are the basis of these dynamic features of the Java platform. The core of these concepts is the binary format used to represent Java classes, including what happens when these classes are loaded into the JVM. This article is not only the basis for the rest of this series, but also demonstrates some very practical problems that developers encounter when using the Java platform.
When developers using the Java language compile their source code with a compiler, they usually don't have to care about the details of what has been done to the source code. However, in this series of articles, I will discuss many behind the scenes details involved from source code to program execution, so I will first explore binary classes generated by the compiler. The binary class format is actually defined by the JVM specification. Usually, these class representations are generated by the compiler from Java language source code, and they are usually stored in the extension Class. However, none of these features matter. Other programming languages have been developed that can use the Java binary class format, and for some purposes, new class representations have been built and immediately loaded into the running JVM. As far as the JVM is concerned, the important part is not the source code and how to store the source code, but the format itself. So what does this class format actually look like? Listing 1 provides the source code for a (very) short class with a partial hexadecimal display of the class file output by the compiler:
... () 0030: 5601 0004 436f 6465 0100 046d 6169 6e01 V... Code... main. 0040: 0016 285b 4c6a 6176 612f 6c61 6e67 2f53 .. ([Ljava/lang/S 0050: 7472 696e 673b 2956 0c00 0700 0807 0014 tring;) V........ 0060: 0c00 1500 1601 000d 4865 6c6c 6f2c 2057 ........ Hello,W 0070: 6f72 6c64 2107 0017 0c00 1800 1901 0005 orld!........... 0080: 4865 6c6c 6f01 0010 6a61 7661 2f6c 616e Hello... java/lan 0090: 672f 4f62 6a65 6374 0100 106a 6176 612f g/Object... java/ 00a0: 6c61 6e67 2f53 7973 7465 6d01 0003 6f75 lang/Sy stem... ou ...
The binary class representation shown in Listing 1 starts with the "cafe babe" feature, It identifies the Java binary class format (and by the way, as a permanent -- but largely unrecognized -- gift to the hard-working barista, who built the Java platform in the spirit of developers). This signature is just a simple way to verify that a data block is indeed declared as an instance of the Java class format. Any Java binary class (even classes that do not appear in the file system) need to start with these four bytes. The rest of the data is not very attractive. The signature is followed by a pair of class format version numbers (in this case, minor version 0 and major version 46 generated by 1.4.1 javac -- 0x2e in hexadecimal), followed by the total number of items in the constant pool. Total number of items (in this case, it is 26, or 0x001a) followed by the actual constant pool data. Here are all constants used for class definition. It includes class name, method name, signature and string (you can identify them in the text explanation on the right side of the hexadecimal dump), as well as various binary values. The length of each item in the constant pool is variable, and the first byte of each item identifies the type of item and how to decode it. I won't explore all the details of these contents in detail here. If interested, there are many available resources, from the actual JVM specification Start. The key is that the constant pool contains all references to other classes and methods used by the class, as well as the actual definitions of the class and its methods. The constant pool often accounts for half or more of the binary class size, but on average it may be less. There are several items after the constant pool, which refer to the constant pool items of the class itself, its superclass and the interface. These items are followed by information about fields and methods, which themselves are represented by complex structures. The executable code of a method appears in the form of code attributes contained in the method definition. This code is expressed in the form of JVM instructions, commonly known as bytecode, which is one of the topics to be discussed in the next section. In the Java class format, attributes are used for several defined purposes, including the mentioned bytecode, constant values of fields, exception handling, and debugging information. However, attributes are not only possible for these purposes. From the beginning, the JVM specification has required the JVM to ignore properties of unknown types. The flexibility brought by this requirement makes it possible to extend the usage of attributes to meet other purposes in the future, such as providing meta information required by the framework using user classes. This method has been widely used in Java derived c# language. Unfortunately, no linkage has been provided to take advantage of this flexibility at the user level. The bytecode that makes up the executable part of the class file is actually machine code for a particular type of computer, the JVM. It is called virtual machine because it is designed to be implemented in software rather than hardware. Each JVM used to run Java platform applications is built around the implementation of the machine. This virtual machine is actually quite simple. It uses a stack architecture, which means that instruction operands are loaded into the internal stack before they are used. The instruction set contains all conventional arithmetic and logic operations, as well as conditional and unconditional transfers, load / store, call / return, stack operations and several special types of instructions. Some instructions contain immediate operation values, which are encoded directly into the instruction. Other instructions directly reference values in the constant pool. Although the virtual machine is simple, the implementation is not. The early (first generation) JVMs were basically interpreters of virtual machine bytecode. These virtual machines are indeed relatively simple, but there are serious performance problems - the time to interpret code is always longer than the time to execute native code. In order to reduce these performance problems, the second generation JVMs added real-time (just in time, JIT) conversion. Before the first execution of Java bytecode, JIT technology compiles it into cost machine code, which provides better performance for repeated execution. The performance of contemporary JVMs is even better, because adaptive technology is used to monitor program execution and selectively optimize frequently used code. Compile cost machines such as C and C + + The language of the code usually needs to link this step after compiling the source code. This linking process combines the code from each independently compiled source file and the shared library code to form an executable program. The Java language is different. Using the Java language, classes generated by the compiler usually remain intact until they are loaded into the JVM. Even building a jar file from a class file does not change this -- jars are just containers for class files. Linking classes is not a separate step, it is part of the job that is executed when the JVM loads these classes into memory. This step adds some overhead when the class is initially loaded, but it also provides a high degree of flexibility for Java applications. For example, when you write an application to use an interface, you can specify its actual implementation at run time. This post binding method for assembling applications is widely used in the Java platform, and servlets are a common example. The rules for loading classes are described in detail in the JVM specification. The basic principle is to load classes only when needed (or at least it looks like this -- the JVM has some flexibility when actually loading, but it must maintain a fixed class initialization order). Each loaded class may have other dependent classes, so the loading process is recursive. The class in Listing 2 shows how this recursive loading works. The demo class contains a simple main method, which creates gre And call the greet method. The greeter constructor creates an instance of the message, which is then used in the greet method call. public void greet() { s_message.print(System.out); } } public class Message { private String m_text; public Message(String text) {