Java – fastest way to check string size

I have the following code in the loop statement

if (sb.toString().getBytes("UTF-8").length >= 5242880) {
    // Do something
}

This works well, but it's slow (in terms of checking size). What's the fastest way?

Solution

You can use to quickly calculate the UTF-8 length

public static int utf8Length(CharSequence cs) {
    return cs.codePoints()
        .map(cp -> cp<=0x7ff? cp<=0x7f? 1: 2: cp<=0xffff? 3: 4)
        .sum();
}

If ASCII characters dominate the content, they may be used a little faster

public static int utf8Length(CharSequence cs) {
    return cs.length()
         + cs.codePoints().filter(cp -> cp>0x7f).map(cp -> cp<=0x7ff? 1: 2).sum();
}

Replace

But you can also consider not recalculating the optimization potential of the whole size, but only the size of the new fragment you attach to StringBuilder, something like that

StringBuilder sb = new StringBuilder();
    int length = 0;
    for(…; …; …) {
        String s = … //calculateNextString();
        sb.append(s);
        length += utf8Length(s);
        if(length >= 5242880) {
            // Do something

            // in case you're flushing the data:
            sb.setLength(0);
            length = 0;
        }
    }

This assumes that if you attach fragments containing proxy pairs, they are always complete and will not be split in half This should always be the case for normal applications

Another possibility suggested by didier-l is to delay the calculation until the length of your StringBuilder reaches the threshold divided by 3. As mentioned earlier, the length of UTF-8 cannot be greater than the threshold However, if it happens that the threshold / 3 is not reached in some implementations, it will be beneficial

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>