Java – how to clear bad characters that are not suitable for utf8 encoding in MySQL?

I have dirty data Sometimes it contains characters like this I use this data to query

WHERE a.address IN ('mydatahere')

For this role, I get

How to filter such characters? I use Java

thank you.

Solution

When I encounter such a problem, I use Perl script to ensure that the following code is used to convert the data to valid UTF-8:

use Encode;
binmode(STDOUT,":utf8");
while (<>) {
    print Encode::decode('UTF-8',$_);
}

The script occupies (possibly corrupted) UTF-8 on stdin and reprints valid UTF-8 to stdout Invalid characters are replaced with (U fffd, Unicode replacement character)

If you run this script on good UTF - 8 input, the output should be the same as the input

If you have data in the database, use DBI to scan the table and use this method to clean up all data to ensure that all contents are valid. UTF-8 is meaningful

This is the first-line Perl version of the same script:

perl -MEncode -e "binmode STDOUT,':utf8';while(<>){print Encode::decode 'UTF-8',\$_}" < bad.txt > good.txt

Edit: add Java only solution

This is an example of how to do this in Java:

import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.charset.CharacterCodingException;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import java.nio.charset.CodingErrorAction;

public class UtfFix {
    public static void main(String[] args) throws InterruptedException,CharacterCodingException {
        CharsetDecoder decoder = Charset.forName("UTF-8").newDecoder();
        decoder.onMalformedInput(CodingErrorAction.REPLACE);
        decoder.onUnmappableCharacter(CodingErrorAction.REPLACE);
        ByteBuffer bb = ByteBuffer.wrap(new byte[] {
            (byte) 0xD0,(byte) 0x9F,// 'П'
            (byte) 0xD1,(byte) 0x80,// 'р'
            (byte) 0xD0,// corrupted UTF-8,was 'и'
            (byte) 0xD0,(byte) 0xB2,// 'в'
            (byte) 0xD0,(byte) 0xB5,// 'е'
            (byte) 0xD1,(byte) 0x82  // 'т'
        });
        CharBuffer parsed = decoder.decode(bb);
        System.out.println(parsed);
        // this prints: Пр?вет
    }
}
The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>