Java – HTML ASCII case insensitive ICU collator
I need to create one corresponding to https://www.w3.org/2005/xpath-functions/collation/html-ascii-case-insensitive/ Collator, that is, the case sensitivity of ASCII A-Z and A-Z characters is ignored during comparison
I tried this with the following icu4j rulebasedcollator:
final RuleBasedCollator collator = new RuleBasedCollator("&a=A,b=B,c=C,d=D,e=E,f=F,g=G,h=H," + "i=I,j=J,k=K,l=L,m=M,n=N,o=O,p=P,q=Q,r=R,s=S,t=T," + "u=U,v=V,u=U,w=W,x=X,y=Y,z=Z").freeze();
However, the following comparison seems to have failed. I hope it will succeed (i.e. return true):
final SearchIterator searchIterator = new StringSearch( "pu",new StringCharacterIterator("iNPut"),collator); return searchIterator.first() >= 0;
What is missing from my rules?
Solution
>This W3C "collation" doesn't look like collator in the usual sense It is a non - sorted ASCII case - insensitive matcher I suspect that it is usually implemented in low-level code. It matches ASCII letters case insensitive, while all other characters match exactly See https://www.w3.org/TR/xpath-functions-31/#html -ascii-case-insensitive-collation
> http://userguide.icu-project.org/collation/customization > http://demo.icu-project.org/icu-bin/collation.html > http://www.unicode.org/reports/tr35/tr35-collation.html#Rules