Java UTF-8 to ASCII conversion and supplement

We accept the input of various national characters in UTF-8 string, and we need to convert them into output ascii string for some legacy use (we don't accept Chinese and Japanese characters, only European languages)

We have a small utility to get rid of all Metaphone symbols:

public static final String toBaseCharacters(final String sText) {
    if (sText == null || sText.length() == 0)
        return sText;

    final char[] chars = sText.tocharArray();
    final int iSize = chars.length;
    final StringBuilder sb = new StringBuilder(iSize);

    for (int i = 0; i < iSize; i++) {
        String sLetter = new String(new char[] { chars[i] });
        sLetter = Normalizer.normalize(sLetter,Normalizer.Form.NFC);

        try {
            byte[] bLetter = sLetter.getBytes("UTF-8");
            sb.append((char) bLetter[0]);
        } catch (UnsupportedEncodingException e) {
        }
    }
    return sb.toString();
}

The question is how to replace all German sharp edges (ß, Đ,đ) And other characters through the above standardization method, its supplement (in the case of ß, the supplement may be "SS", if OD Đ Supplement will be "d" or "DJ")

There are no simple ways to do it, no million Replaceall() call?

So for example: Đ Onardan = djonardan, BLA ß = Blass, etc

We can replace all "problematic" characters with spaces, but avoid doing so and make the output as similar to the input as possible

Thank you for your answer,

Bozo

Solution

If you only support European and Latin languages, about 100 is enough, which is absolutely feasible: grab the Unicode charts of Latin-1 supply and Latin Extended-A and start string Replace party

The above is all the contents of Java UTF-8 to ASCII conversion and supplement collected by programming house for you. I hope this article can help you solve the program development problems encountered in Java UTF-8 to ASCII conversion and supplement.

If you think the content of the programming home website is good, you are welcome to recommend the programming home website to programmers and friends.

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>