What is the maximum length of the literal defined by string in Java

The bottom layer of Java string object is stored in character array. Theoretically, the maximum length of char [] is the maximum value of int, but in practice

Idea:

First, string literal constants are maintained by the string class, And it can be determined at compile time (please refer to string constant pool for details). Therefore, if there is a maximum length of string literal constant (let's assume for the moment), and if the literal constant we use exceeds this limit, the compiler can give error information during compilation. Therefore, we can use IO stream to generate java files. The content of the file is to declare a string object, then use literal constant assignment, adjust the length of literal constant according to the dynamic compilation results, and finally get The maximum length of a literal constant

Draw a conclusion according to the following code (the code comes from the book in-depth analysis of Java: 36 topics analyzing the essence of Java):

import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.IOException;
import java.io.OutputStream;

import javax.tools.JavaCompiler;
import javax.tools.ToolProvider;

public class LiteralLength {

 public static void main(String[] args) throws Exception {
 String fileName = "D:/Literal.java";
 StringBuilder prefix = new StringBuilder();
 prefix.append("public class Literal{ String s = \"");
 int low = 0;
 int high = 100_0000;
 int mid = (low + high)/2;
 StringBuilder literal = new StringBuilder(high);

 int result;

 String ch = "A";
 JavaCompiler compiler = ToolProvider.getSystemJavaCompiler();
 //自定义错误输出流 取代System的err
 OutputStream err = new OutputStream() {

  @Override
  public void write(int b) throws IOException {

  }
 };

 int max = 0;
 for (int i = 0; i < mid; i++) {
  literal.append(ch);
 }
 while(low <= high){
  StringBuilder fileContent
        = new StringBuilder(literal.length() + prefix.length() * 2);
  fileContent.append(prefix);
  fileContent.append(literal);
  fileContent.append("\";}");
  FileWriter w = new FileWriter(fileName);
  BufferedWriter bw = new BufferedWriter(w);
  bw.write(fileContent.toString());
  bw.close();
  w.close();//生成java文件
  result = compiler.run(null,null,err,fileName);

  //代码点的数量
  int codePointCount = literal.codePointCount(0,literal.length());
  if(result == 0){//0表示没有编译错误
  low = mid + 1;
  mid = (low + high)/2;
  max = codePointCount;
  for (int i = codePointCount; i < mid; i++) {
   literal.append(ch);
  }
  System.out.println("长度" + max
            + "编译成功,增加长度至" + mid);

  }else{
  //编译错误,说明字面量太长
  high = mid - 1;
  mid = (low + high)/2;
  System.err.println("长度" + codePointCount
            + "编译失败,减少长度至" + mid);
  int start = ch.length() == 1? mid : mid *2;
  literal.delete(start,literal.length());
  }
 }
 err.close();
 System.out.println("最大字面量长度:" + max);

 }
}

Output results:

But if you modify the code

String ch = "α";

Conclusion: maximum literal length: 32767

If string ch = "word";

Maximum literal length: 21845

In the class file, use constant_ Utf8_ Info table to store various constant strings, including string literal constants, fully qualified names of classes or interfaces, names and descriptors of methods and variables, etc. CONSTANT_ Utf8_ The structure of info table is shown in table.

It can be seen from table 3-1 that constant_ Utf8_ The info table uses 2 bytes to represent the length of the string. Therefore, the maximum length of the bytes array is 216 − 1, that is, 65535 bytes. But, Why are the running results of four characters (a, a, word and㊣) different? The reason is that in the constant_utf8_info table, bytes are represented by 1 byte from "\ u0001" ~ "\ u007f", and empty characters (null, i.e., "\ u0000") and "\ u0080" ~ "\ u07ff" are represented by 2 bytes from "\ u0800" ~ "\ ufff" , 3 bytes are used, and 6 bytes are used for supplementary characters, that is, characters with code points ranging from "U + 10000" to "U + 10ffff". It can also be considered that the supplementary characters are represented by a proxy pair, and the value range of the proxy pair is "\ ud800" ~ "\ udfff". These characters are between "\ u0800" ~ "\ uFFFF". Each proxy character is represented by 3 bytes, a total of 6 bytes. The above storage is implemented in the class file and should not be confused with the characters in the Java program. For the Java program, "a", "á" and "word" are represented by one char type variable, that is, 2 bytes, while "[illustration]" (supplementary character) is represented by two char type variables, that is, 4 bytes.

The maximum length of string literal constant is different from the maximum length of string in memory. The maximum length of the latter is the maximum value of int type, i.e. 2147483647. The maximum length of the former is also different according to different characters (character Unicode value), and the maximum length is 65534 (you can manually modify the class file to make the output result 65535).

The maximum length of a string literal constant is determined by constant_ Utf8_ Info table. The length is determined at compile time. If it exceeds constant_ Utf8_ The upper limit that can be represented by the bytes array in the info table will generate a compilation error.

The above is the whole content of this article. I hope it will help you in your study, and I hope you will support us a lot.

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>