New features of jdk9: String compression and character encoding
brief introduction
What is the underlying storage of string? I believe most people will say it's an array. If you ask again, what array is used to store it? I believe different people have different answers.
Before jdk9, the underlying storage structure of string is char [], and a char needs to occupy two bytes of storage units.
It is said that JDK developers have investigated the heap dump information of thousands of applications, and then come to a conclusion: most strings are represented by Latin-1 character coding. Only one byte is enough to store, and two bytes are completely wasted.
It is said that they used big data + artificial intelligence, and we can't believe their conclusion.
So after jdk9, the underlying storage of strings becomes byte [].
Underlying implementation
Let's take a look at the implementation of string before java9:
public final class String
implements java.io.Serializable,Comparable<String>,CharSequence {
//The value is used for character storage.
private final char value[];
}
Let's look at the implementation of string and some key variables in java9:
public final class String
implements java.io.Serializable,CharSequence {
/** The value is used for character storage. */
@Stable
private final byte[] value;
private final byte coder;
@Native static final byte latin1 = 0;
@Native static final byte UTF16 = 1;
static final boolean COMPACT_STRINGS;
static {
COMPACT_STRINGS = true;
}
From the code, we can see that the underlying storage has become byte [].
Take another look at the coder variable. Coder represents the encoding format. At present, string supports two encoding formats, Latin1 and utf16.
Latin1 needs to be stored in one byte. Utf16 needs 2 bytes or 4 bytes to store.
And compact_ Strings is used to control whether the compact function of string is enabled. By default, compact_ The strings function is on.
If we want to turn off compact_ For the strings function, you can use the - XX: - compactstrings parameter.