Java – why does a new string with UTF-8 contain more bytes
byte bytes[] = new byte[16];
byte bytes[] = new byte[16]; random.nextBytes(bytes); try { return new String(bytes,"UTF-8"); } catch (UnsupportedEncodingException e) { log.warn("Hash generation Failed",e); }
When I generate a string using the given method, and when I apply string getBytes(). Length, it returns some other values Max is 32 Why does a 16 byte array eventually generate a byte string of another size?
But if I do string Length() which returns 16
Solution
This is because your bytes are first converted to unicode strings, and it attempts to create a UTF - 8 character sequence from these bytes If a byte cannot be treated as an ASCII character or captured by the next byte to form a legal Unicode character, it will be replaced with "" When calling string#getbytes(), such char is converted to 3 bytes, which adds 2 additional bytes to the result output
If you are lucky to generate only ASCII characters, string#getbytes () will return a 16 byte array, otherwise the result array may be longer For example, the following code snippet:
byte[] b = new byte[16]; Arrays.fill(b,(byte) 190); b = new String(b,"UTF-8").getBytes();
Return 48 (!) An array of bytes long