Android compares the UTF-8 string with the UTF-8 input string EditText
In my Android application, I want to compare a UTF-8 string, such as "B ã I" with the string the user typed on EditText. However, if I type "B ã I" into EditText and get the input string by using the method EditText. Gettext(). Tostring(), it will return the string
It is not equal to "B ã I"
I'll try it, too
String input = new String(input.getBytes("UTF-8"), "UTF-8");
But it doesn't work. Input. Equals ("B ã I") will return false
Who knows how to solve this problem? Thank you for your help
resolvent:
In Unicode, some characters can be represented in many ways. For example, in the word B ã I, the middle character can be represented in two ways:
>Single code point u 00e3 (Latin small letter a with tilde) > two code points u 0061 (Latin small letter a) and u 0303 (combining tilde)
For display, the two should look the same
For string comparison, this can cause problems. The solution is to first standardize strings according to Unicode standard annex #15 - Unicode normalization forms
Java version (including Android) supports normalized normalizer class (for Android, see normalizer)
The following code shows the results:
String s1 = "b\u00e3i";
String s2 = "ba\u0303i";
System.out.println(String.format("Before normalization: %s == %s => %b", s1, s2, s1.equals(s2)));
String n1 = Normalizer.normalize(s1, Form.NFD);
String n2 = Normalizer.normalize(s2, Form.NFD);
System.out.println(String.format("After normalization: %s == %s => %b", n1, n2, n1.equals(n2)));
It outputs:
Before normalization: bãi == bãi => false
After normalization: bãi == bãi => true
BTW: form.nfd decomposes the string, that is, it creates a longer representation with two code points. Form.nfc will create a shorter form