UTF-8 and utf-16 in Java

2020-03-06 • Java

I really hope the following byte data should show different, but in fact, they are the same, according to wiki http://en.wikipedia.org/wiki/UTF-8#Examples , the encoding in bytes looks different, but why do java print them the same?

String a = "€";
    byte[] utf16 = a.getBytes(); //Java default UTF-16
    byte[] utf8 = null;

    try {
        utf8 = a.getBytes("UTF-8");
    } catch (UnsupportedEncodingException e) {
        throw new RuntimeException(e);
    }

    for (int i = 0 ; i < utf16.length ; i ++){
        System.out.println("utf16 = " + utf16[i]);
    }

    for (int i = 0 ; i < utf8.length ; i ++){
        System.out.println("utf8 = " + utf8[i]);
    }

Solution

Although Java internally saves characters as utf-16, when you use string When getbytes() is converted to bytes, each character is converted using the default platform encoding, which may be similar to windows-1252 My result is:

utf16 = -30
utf16 = -126
utf16 = -84
utf8 = -30
utf8 = -126
utf8 = -84

This means that the default code on my system is "UTF - 8"

Also note that string The document for getbytes() has the following comment: the behavior of this method when this string is not specified and cannot be encoded in the default character set

However, in general, if you always specify an encoding like using a. GetBytes ("UTF-8"), you will avoid confusion

Another thing that can cause confusion is to include Unicode characters directly in the source file: string a = "€" The euro symbol must be encoded as one or more bytes stored in the file When Java compiles your program, it will see these bytes and decode them back to the euro symbol you hope. You must ensure that software that saves Euro symbols to files (Notepad, eclipse, etc.) encodes them in the same way as Java expects UTF - 8 is becoming more and more popular, but it is not popular, and many editors will not write files in UTF - 8

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.

THE END

Java

二维码

Java – how to ignore unit tests when conditions are met?

< <上一篇

JDBC basic operation example code

下一篇>>

搜索内容

UTF-8 and utf-16 in Java

Solution

热门文章