Get the file character set with jchardet

2021-09-11 • Java

Some time ago, when learning Lucene, I encountered the problem of reading TXT documents and encountering coding errors. Learned several solutions, Most of the files are converted to hexadecimal (you can use Ctrl + H of UE to view them) and read the first four flag bits to judge. However, some text files are unrecognized (I encounter some files encoded with UTF-8). Later, jchardet was found. Jchardet is Mozilla (that's the Firefox) Java implementation of the code recognition algorithm. Forget it, this is the official website. Let's see for ourselves.

Upper Code:

There is also a problem with this, that is, if the Unicode encoded file is recognized, it will return windows-1252. When I use windows-1252 as the code, I will report an error.

By the way, provide another address for downloading the jar package. Sometimes the official website is windy and can't be accessed.

Download address: http://download.csdn.net/detail/tianxiexingyun/8286849

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.

THE END

Java

二维码

Java: the optimization process of exporting large quantities of Excel data

< <上一篇

wordpress 好用的插件

下一篇>>

搜索内容

Get the file character set with jchardet

热门文章