String what is the maximum length of a string?

In the process of learning and development, we often discuss the value range of basic data types such as short, int and long, but we seem to pay little attention to the "value range" of string type. So is there any length limit for string type?

In fact, string objects have length restrictions. String objects cannot "store" strings of unlimited length. The length limit of string should be considered from two aspects: compile time limit and run-time limit.

Compile time limit

Students with knowledge of JVM virtual machines must know that the string constant "free path" defined below will be put into the constant pool in the method area.

String s = "自由之路";
System.out.println(s);

Stirng length is limited because the JVM specification limits the constant pool. Each data item in the constant pool has its own type. UTF-8 encoded Unicode strings in Java are represented as constant in the constant pool_ Utf8 type representation.

CONSTANT_ The data structure of utf8 is as follows:

CONSTANT_Utf8_info {
    u1 tag;
    u2 length;
    u1 bytes[length];
}

Let's focus on the byte array with length. This array is where constant data is really stored, and length is the maximum number of bytes that the array can store. The type of length is U2, which is an unsigned 16 bit integer. Therefore, the maximum length allowed in theory is 2 ^ 16-1 = 65535. Therefore, the maximum length of the byte array above can be 65535.

//65535个d,编译报错
String s = "dd..dd";

//65534个d,编译通过
String s1 = "dd..d";

The string s with length 65535 in the above column failed to compile, but the string S1 with length 65534 was compiled successfully. This seems to be inconsistent with our conclusion just now.

In fact, this is an additional limitation of the javac compiler. The following code can be found in the javac source code:

private void checkStringConstant(DiagnosticPosition var1,Object var2) {
    if (this.nerrs == 0 && var2 != null && var2 instanceof String &&   ((String)var2).length() >= 65535) {
        this.log.error(var1,"limit.string",new Object[0]);
        ++this.nerrs;
    }
}

As can be seen from the code, when the parameter type is string and the length is greater than or equal to 65535, the compilation will fail.

What needs to be emphasized here is that there is another part of the string limit, that is, the limit on the number of bytes stored at the bottom of the string. In other words, when compiling, an error will be reported when the length of a string is greater than or equal to 65535 or the number of bytes occupied by the underlying storage is greater than 65535. This sentence may be more abstract. Let's make it clear.

Character constants in Java are encoded in utf8, which uses 1 ~ 4 bytes to represent specific Unicode characters. Therefore, some characters occupy one byte, while most Chinese we usually use need three bytes to store.

//65534个字母,编译通过
String s1 = "dd..d";

//21845个中文”自“,编译通过
String s2 = "自自...自";

//一个英文字母d加上21845个中文”自“,编译失败
String s3 = "d自自...自";

For S1, the utf8 encoding of a letter D occupies one byte, and the 65534 letter occupies 65534 bytes. The length is 65534. The length and storage do not exceed the limit, so it can be compiled.

For S2, a Chinese takes up 3 bytes, 21845 just takes up 65535 bytes, and the string length is 21845, and the length and storage do not exceed the limit, so it can be compiled.

For S3, an English letter D plus 21845 Chinese "self" occupies 65536 bytes, exceeding the maximum storage limit, and the compilation fails.

Runtime restrictions

The limitation of string runtime is mainly reflected in the constructor of string. The following is a constructor of string:

public String(char value[],int offset,int count) {
    ...
}

The count value above is the maximum length of the string. In Java, the maximum length of int is 2 ^ 31-1. So at run time, the maximum length of a string is 2 ^ 31-1.

But this is also the theoretical length. The actual length depends on the memory of your JVM. Let's see how much memory the largest string will occupy.

(2^31-1)*2*16/8/1024/1024/1024 = 4GB

So in the worst case, a maximum string takes up 4GB of memory. If your virtual machine cannot allocate so much memory, it will directly report an error.

After jdk9, string storage has been optimized. The bottom layer no longer uses char array to store strings, but byte array. For strings with Latin1 characters, you can save twice the memory space.

Simple summary

The length of string is limited.

Official account recommendation

Welcome to my WeChat official account, "programmer The Freedom Trail".

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>