Explain in detail how Java obtains the file encoding format

2019-08-11 • Java

1: Simply judge whether it is UTF-8 or not. Because it is generally GBK except UTF-8, it is set to GBK by default@ H_ 301_ 2@

When a file is stored according to a given character set, the encoding information may be stored in the first three bytes of the file. Therefore, the basic principle is that as long as the first three bytes of the file are read out and the values of these bytes are determined, the encoding format can be known. In fact, if the project runs on a Chinese operating system, if these text files are generated in the project, that is, developers can control the text coding format, as long as they determine two common codes: GBK and UTF-8. Since the default encoding of Chinese windows is GBK, it is generally only necessary to determine the UTF-8 encoding format@ H_ 301_ 2@

For text files in UTF-8 encoding format, the values of the first three bytes are - 17, - 69, - 65. Therefore, the code fragment to determine whether it is in UTF-8 encoding format is as follows: @ h_ 301_ 2@

2: If you want to realize more complex file coding detection, you can use an open source project cpdetector, whose website is: http://cpdetector.sourceforge.net/ 。 Its class library is very small, only about 500K. Cpdetector is based on statistical principle and is not guaranteed to be completely correct. The code for judging text files by using this class library is as follows: @ h_ 301_ 2@

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.

THE END

Java

二维码

Summary of java file reading methods

< <上一篇

Detailed explanation of Android RadioButton picture position and size

下一篇>>

搜索内容

Explain in detail how Java obtains the file encoding format

热门文章