Java – why does a new string with UTF-8 contain more bytes

2020-08-21 • Java

byte bytes[] = new byte[16];

byte bytes[] = new byte[16];
random.nextBytes(bytes);
try {
   return new String(bytes,"UTF-8");
} catch (UnsupportedEncodingException e) {
   log.warn("Hash generation Failed",e);
}

When I generate a string using the given method, and when I apply string getBytes(). Length, it returns some other values Max is 32 Why does a 16 byte array eventually generate a byte string of another size?

But if I do string Length() which returns 16

Solution

This is because your bytes are first converted to unicode strings, and it attempts to create a UTF - 8 character sequence from these bytes If a byte cannot be treated as an ASCII character or captured by the next byte to form a legal Unicode character, it will be replaced with "" When calling string#getbytes(), such char is converted to 3 bytes, which adds 2 additional bytes to the result output

If you are lucky to generate only ASCII characters, string#getbytes () will return a 16 byte array, otherwise the result array may be longer For example, the following code snippet:

byte[] b = new byte[16]; 
Arrays.fill(b,(byte) 190);  
b = new String(b,"UTF-8").getBytes();

Return 48 (!) An array of bytes long

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.

THE END

Java