Java UTF-8 to ASCII conversion and supplement
We accept the input of various national characters in UTF-8 string, and we need to convert them into output ascii string for some legacy use (we don't accept Chinese and Japanese characters, only European languages)
We have a small utility to get rid of all Metaphone symbols:
public static final String toBaseCharacters(final String sText) { if (sText == null || sText.length() == 0) return sText; final char[] chars = sText.tocharArray(); final int iSize = chars.length; final StringBuilder sb = new StringBuilder(iSize); for (int i = 0; i < iSize; i++) { String sLetter = new String(new char[] { chars[i] }); sLetter = Normalizer.normalize(sLetter,Normalizer.Form.NFC); try { byte[] bLetter = sLetter.getBytes("UTF-8"); sb.append((char) bLetter[0]); } catch (UnsupportedEncodingException e) { } } return sb.toString(); }
The question is how to replace all German sharp edges (ß, Đ,đ) And other characters through the above standardization method, its supplement (in the case of ß, the supplement may be "SS", if OD Đ Supplement will be "d" or "DJ")
There are no simple ways to do it, no million Replaceall() call?
So for example: Đ Onardan = djonardan, BLA ß = Blass, etc
We can replace all "problematic" characters with spaces, but avoid doing so and make the output as similar to the input as possible
Thank you for your answer,
Bozo
Solution
If you only support European and Latin languages, about 100 is enough, which is absolutely feasible: grab the Unicode charts of Latin-1 supply and Latin Extended-A and start string Replace party
The above is all the contents of Java UTF-8 to ASCII conversion and supplement collected by programming house for you. I hope this article can help you solve the program development problems encountered in Java UTF-8 to ASCII conversion and supplement.
If you think the content of the programming home website is good, you are welcome to recommend the programming home website to programmers and friends.