Android compares the UTF-8 string with the UTF-8 input string EditText

In my Android application, I want to compare a UTF-8 string, such as "B ã I" with the string the user typed on EditText. However, if I type "B ã I" into EditText and get the input string by using the method EditText. Gettext(). Tostring(), it will return the string

It is not equal to "B ã I"

I'll try it, too

String input = new String(input.getBytes("UTF-8"), "UTF-8");

But it doesn't work. Input. Equals ("B ã I") will return false

Who knows how to solve this problem? Thank you for your help

resolvent:

In Unicode, some characters can be represented in many ways. For example, in the word B ã I, the middle character can be represented in two ways:

>Single code point u 00e3 (Latin small letter a with tilde) > two code points u 0061 (Latin small letter a) and u 0303 (combining tilde)

For display, the two should look the same

For string comparison, this can cause problems. The solution is to first standardize strings according to Unicode standard annex #15 - Unicode normalization forms

Java version (including Android) supports normalized normalizer class (for Android, see normalizer)

The following code shows the results:

String s1 = "b\u00e3i";
String s2 = "ba\u0303i";
System.out.println(String.format("Before normalization: %s == %s => %b", s1, s2, s1.equals(s2)));

String n1 = Normalizer.normalize(s1, Form.NFD);
String n2 = Normalizer.normalize(s2, Form.NFD);
System.out.println(String.format("After normalization:  %s == %s => %b", n1, n2, n1.equals(n2)));

It outputs:

Before normalization: bãi == bãi => false
After normalization:  bãi == bãi => true

BTW: form.nfd decomposes the string, that is, it creates a longer representation with two code points. Form.nfc will create a shorter form

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>