Explain in detail how Java obtains the file encoding format
1: Simply judge whether it is UTF-8 or not. Because it is generally GBK except UTF-8, it is set to GBK by default@ H_ 301_ 2@
When a file is stored according to a given character set, the encoding information may be stored in the first three bytes of the file. Therefore, the basic principle is that as long as the first three bytes of the file are read out and the values of these bytes are determined, the encoding format can be known. In fact, if the project runs on a Chinese operating system, if these text files are generated in the project, that is, developers can control the text coding format, as long as they determine two common codes: GBK and UTF-8. Since the default encoding of Chinese windows is GBK, it is generally only necessary to determine the UTF-8 encoding format@ H_ 301_ 2@
For text files in UTF-8 encoding format, the values of the first three bytes are - 17, - 69, - 65. Therefore, the code fragment to determine whether it is in UTF-8 encoding format is as follows: @ h_ 301_ 2@
2: If you want to realize more complex file coding detection, you can use an open source project cpdetector, whose website is: http://cpdetector.sourceforge.net/ 。 Its class library is very small, only about 500K. Cpdetector is based on statistical principle and is not guaranteed to be completely correct. The code for judging text files by using this class library is as follows: @ h_ 301_ 2@