Java – fastest way to check string size
I have the following code in the loop statement
if (sb.toString().getBytes("UTF-8").length >= 5242880) { // Do something }
This works well, but it's slow (in terms of checking size). What's the fastest way?
Solution
You can use to quickly calculate the UTF-8 length
public static int utf8Length(CharSequence cs) { return cs.codePoints() .map(cp -> cp<=0x7ff? cp<=0x7f? 1: 2: cp<=0xffff? 3: 4) .sum(); }
If ASCII characters dominate the content, they may be used a little faster
public static int utf8Length(CharSequence cs) { return cs.length() + cs.codePoints().filter(cp -> cp>0x7f).map(cp -> cp<=0x7ff? 1: 2).sum(); }
Replace
But you can also consider not recalculating the optimization potential of the whole size, but only the size of the new fragment you attach to StringBuilder, something like that
StringBuilder sb = new StringBuilder(); int length = 0; for(…; …; …) { String s = … //calculateNextString(); sb.append(s); length += utf8Length(s); if(length >= 5242880) { // Do something // in case you're flushing the data: sb.setLength(0); length = 0; } }
This assumes that if you attach fragments containing proxy pairs, they are always complete and will not be split in half This should always be the case for normal applications
Another possibility suggested by didier-l is to delay the calculation until the length of your StringBuilder reaches the threshold divided by 3. As mentioned earlier, the length of UTF-8 cannot be greater than the threshold However, if it happens that the threshold / 3 is not reached in some implementations, it will be beneficial