In depth understanding of Java string

In depth understanding of Java string

1、 Java Memory Model

According to the official saying: the Java virtual machine has a heap, which is the runtime data area from which the memory of all class instances and arrays is allocated.

The JVM mainly manages two types of memory: heap memory and non heap memory. Heap memory is created when the Java virtual machine is started, and non heap memory is memory outside the JVM heap.

In short, the non heap contains the method area, the memory required for internal processing or optimization of the JVM (such as jitcompiler, just in time compiler, code cache after immediate compilation), each class structure (such as runtime constant pool, field and method data), and the code of methods and construction methods.

Java heap is a run-time data area from which space is allocated by class objects. These objects are established through new, newarray, anewarray and multianewarray instructions. They do not need program code to explicitly release.

Heap is responsible for garbage collection. The advantage of heap is that it can dynamically allocate memory size, and the lifetime does not need to tell the compiler in advance, because it dynamically allocates memory at runtime, and the Java garbage collector will automatically collect the data that is no longer used. However, the disadvantage is that the access speed is slow due to the dynamic allocation of memory at run time.   

The advantage of stack is that the access speed is faster than heap, second only to register, and stack data can be shared. However, the disadvantage is that the data size and lifetime in the stack must be determined and lack of flexibility. The stack mainly stores some basic types of variable data (int, short, long, byte, float, double, Boolean, char) and object handles (references).

The virtual machine must maintain a constant pool for each mounted type. Constant pool is an ordered collection of constants used by this type, including direct constants (string, integer and floating point constants) and symbolic references to other types, fields and methods

For a string constant, its value is in the constant pool. The constant pool in the JVM exists in memory in the form of a table. For string type, there is a constant of fixed length_ String_ Info table is used to store text string values. Note: this table only stores text string values, not symbol references. At this point, you should have a clear understanding of the storage location of string values in the constant pool. During program execution, the constant pool will be stored in the method area instead of the heap. Many string objects are stored in the constant pool; And can be shared, so it improves efficiency

2、 Case analysis

Summary:

1. The string class is immutable after initialization

There is a lot to say about this. You only need to know that once the string instance is generated, it will not change again, for example: String STR = "kV" + "ill" + "" + "ans"; There are four string constants. First, "kV" and "ill" generate "kvill" in memory, then "kvill" and "kvill" generate "kvill" in memory, and finally "kvill ans"; The address of this string is assigned to STR because the "immutability" of string generates many temporary variables, which is why StringBuffer is recommended because StringBuffer can be changed.

Here are some common string related questions:

It can be seen that final is only valid for the "value" (i.e. memory address) of the reference. It forces the reference to only point to the object initially pointed to. Changing its point will lead to a compile time error. As for the change of the object it points to, final is not responsible.

2. String constants in the code are collected during compilation and placed in the constant area of the class file, such as "123", "123" + "456", etc. expressions containing variables will not be included, such as "123" + a.

3. When loading classes, the JVM generates a constant pool according to the string in the constant area. Each character sequence such as "123" will generate an instance and put it in the constant pool. This instance is not in the heap and will not be GC. From the constructor of the source code, the value attribute of this instance should be created with new array and placed in 123, So according to my understanding, the address of the character array stored in value is in the heap. If there is any error, you are welcome to correct it.

4. Using string does not necessarily create an object

When executing a statement containing a string in double quotation marks, such as string a = "123", the JVM will first look in the constant pool and return the reference of this instance in the constant pool if any. Otherwise, create a new instance and put it into the constant pool. If string a = "123" + B (assuming B is "456"), the first half of "123" still follows the route of constant pool, but this + operator is actually converted to [sringbuffer] Appad(), so a finally gets a new instance reference, and a's value stores the address of the memory space of a newly applied character array (storing "123456"), and "123456" may not exist in the constant pool at this time.

Note: we are using such as string STR = "ABC"; When defining a class in the format of, it is always taken for granted that the object STR of the string class is created. Worry about the trap! Object may not have been created! It may just point to an object that has been created previously. Only through the new () method can we ensure that a new object is created every time

5. Use new string to create an object

When executing string a = new string ("123"), first take the route of constant pool to get the reference of an instance, then create a new string instance on the heap, take the following constructor to assign the value attribute, and then assign the instance reference to a:

We can see that although a new instance of string is created, value is equal to the value of the instance in the constant pool, that is, there is no new character array to store "123".

In the case of string a = new string ("123" + b), first look back to point 4, "123" + B ", get an instance, and then execute according to the above constructor.

6.String. intern()

After the instance of the string object calls the intern method, you can let the JVM check the constant pool. If there is no string sequence corresponding to the value attribute of the instance, such as "123" (note that the string sequence is checked instead of the instance itself), put the instance into the constant pool. If there is a string sequence "123" corresponding to the value attribute of the current instance in the constant pool, The reference of the instance corresponding to "123" in the constant pool is returned instead of the reference of the current instance, even if the value of the current instance is "123".

Exist in The constant pool in the class file is loaded by the JVM during runtime and can be expanded. String's intern () method is a method to expand the constant pool; When a string instance STR calls the intern () method, Java looks up whether there are string constants with the same Unicode in the constant pool. If so, it returns its reference. If not, it adds a string with Unicode equal to STR in the constant pool and returns its reference; Just look at the example

Finally, I broke a wrong understanding: some people say, "Using the string. Intern() method, you can save the of a string class to a global string table. If a Unicode string with the same value is already in this table, this method returns the address of the existing string in the table. If there is no character string with the same value in the table, it registers its own address in the table." If I understand the global string table as a constant pool, his last sentence, "if there is no string with the same value in the table, register your address in the table" is wrong:

In this class, we do not claim a "kvill" constant, so there is no "kvill" in the constant pool at first, when we call S1 After intern(), a new "kvill" constant is added to the constant pool. The original "kvill" that is not in the constant pool still exists, so it is not "register your address in the constant pool".

   s1==s1. Intern() is false, indicating that the original "kvill" still exists; S2 is now the address of "kvill" in the constant pool, so S2 = = S1 Intern() is true.

What are the differences between StringBuffer and StringBuilder? What are their application scenarios?

In the implementation of JDK, both StringBuffer and StringBuilder inherit from abstractstringbuilder. For the safety and non safety of multithreading, you can see a bunch of synchronized methods in front of StringBuffer.

Let's talk about the implementation principle of abstractstringbuilder: we know that using StringBuffer is nothing more than to improve the efficiency of string connection in Java, because if you directly use + for string connection, the JVM will create multiple string objects, resulting in certain sales. Abstractstringbuilder uses a char array to save the string to append. The char array has an initial size. When the length of the append string exceeds the capacity of the current char array, the char array will be dynamically expanded, that is, re apply for a larger memory space, and then copy the current char array to a new location, Because the overhead of reallocating memory and copying is relatively large, each time you re apply for memory space, you apply for more than the current required memory space, which is twice here,

StringBuffer started with JDK 1.0

StringBuilder started with JDK 1.5

Starting from JDK 1.5, the connection operation (+) with string variables is implemented by StringBuilder within the JVM, which was previously implemented by StringBuffer.

Let's look at the execution process through a simple program:

Use the command javap - C buffer to view its bytecode implementation:

Comparing Listing 1 with Listing 2, the LDC instruction in the bytecode in Listing 2 loads the "AAAAA" string from the constant pool to the top of the stack, iStore_ 1 save "AAAAA" to variable 1. As in the following, sipush pushes a short integer constant value (- 32768 ~ 32767) to the top of the stack. Here is the constant "3694".

Let's directly see that 13,13 ~ 17 is a new StringBuffer object and call its initialization method, and 20 ~ 21 is through aload first_ 1 press variable 1 to the top of the stack. As mentioned earlier, variable 1 puts the string constant "AAAAA", and then call the append method of StringBuffer through the instruction invokevirtual to splice "AAAAA". The same is true for the following 24 ~ 30. Finally, at 33, call the toString function of StringBuffer to obtain the string result and store it in variable 3 through astore.

Some people may say, "since StringBuffer is used to connect strings within the JVM, we don't need to use StringBuffer, just use" + "directly!" is it? Of course not. As the saying goes, "there is a reason for it", let's continue to look at the byte code corresponding to the following loop.

37 ~ 42 are some preparations before entering the for loop. 37 and 38 set J to 1. 44 here through if_ Icmpge compares J with 10. If J is greater than 10, it directly jumps to 73, that is, the return statement exits the function; Otherwise, it enters the loop, that is, 47 ~ 66 bytecode. Here, we only need to look at 47 to 51 to know why we need to use StringBuffer to handle string connection in the code, because each time the "+" operation is performed, the JVM needs to create a new StringBuffer object to handle string connection, which will be very expensive when many string connection operations are involved.

If you have any questions, please leave a message or go to the community of this site for exchange and discussion. Thank you for reading. I hope it can help you. Thank you for your support to this site!

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>