Get the file character set with jchardet

Some time ago, when learning Lucene, I encountered the problem of reading TXT documents and encountering coding errors. Learned several solutions, Most of the files are converted to hexadecimal (you can use Ctrl + H of UE to view them) and read the first four flag bits to judge. However, some text files are unrecognized (I encounter some files encoded with UTF-8). Later, jchardet was found. Jchardet is Mozilla (that's the Firefox) Java implementation of the code recognition algorithm. Forget it, this is the official website. Let's see for ourselves.

Upper Code:

There is also a problem with this, that is, if the Unicode encoded file is recognized, it will return windows-1252. When I use windows-1252 as the code, I will report an error.

By the way, provide another address for downloading the jar package. Sometimes the official website is windy and can't be accessed.

Download address: http://download.csdn.net/detail/tianxiexingyun/8286849

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>