Implementation of Java character encoding and decoding

Character set basis:

Character set is a collection of characters, that is, symbols with special semantics. The letter "a" is a character. "%" is also a character. It has no intrinsic numerical value and has no direct connection with ASC II, Unicode or even computers. Symbols have existed for a long time before computers came into being (coded character set) a set of values assigned to a character. Codes are assigned to characters so that they can express the results of numbers with a specific character coding set. Other coded character sets can assign different values to the same character. Character set mapping is usually determined by standards organizations, such as usaascii, ISO 8859-1, Unicode (ISO 10646-1), And JIS x0201. Character encoding scheme encodes character set members to octets (8-bit bytes). The encoding scheme defines how to express the character encoded sequence as a byte sequence. The value of the character encoding does not need to be the same as the encoded byte, nor does it need to be a one-to-one or one to many relationship. In principle, the encoding and decoding of the character set are approximately regarded as the serialization and deserialization of the object.

Generally, character data coding is used for network transmission or file storage. The encoding scheme is not a character set, it is a mapping; However, because of the close relationship between them, most codes are associated with an independent character set. For example, UTF - 8 is used only to encode Unicode character sets. Nevertheless, it is possible to process multiple character sets with one encoding scheme. For example, EUC can encode characters in several Asian languages. Figure 6-1 is a graphical expression for encoding Unicode character sequences into byte sequences using UTF-8 encoding scheme. UTF - 8 encodes character code values less than 0x80 into a single byte value (standard ASC II). All other Unicode characters are encoded into a multi byte sequence of 2 to 6 bytes( http://www.ietf.org/rfc/rfc2279.txt )。

Charset (character set) the term charset is in rfc2278( http://ietf.org/rfc/rfc2278.txt )Defined in. It is a collection of coded character sets and character coding schemes. java. nio. The class of charset package is charset, which encapsulates character set extraction. 1111111111111 Unicode is a 16 bit character encoding. It tries to unify the character sets of all languages in the world into an independent and comprehensive mapping. It has won a place, but there are still many other character codes being widely used. Most operating systems are still byte oriented in I / O and file storage, so no matter what encoding, Unicode or other encoding, it still needs to be transformed between byte sequence and character set encoding. By Java nio. The class composed of charset package meets this requirement. This is not the first time that the Java platform handles character set coding, but it is the most systematic, comprehensive and flexible solution. java. nio. charset. The SPI package provides a server provisioning interface (SPI) so that encoders and decoders can choose to insert as needed.

Character set: determines the default value at JVM startup, depending on the potential operating system environment, locale, and / or JVM configuration. If you need a specified character set, the safest way is to name it explicitly. Do not assume that the default deployment is the same as your development environment. Character set names are not case sensitive, that is, when comparing character set names, uppercase and lowercase letters are considered the same. The internet name assignment authority (IANA) maintains all officially registered character set names.

Example 6-1 demonstrates how to translate characters into byte sequences through different charset implementations. Example 6 - 1 Use standard character set encoding

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>