Java – what is the optimal initial capacity of a StringBuffer for input with a very variable length?
Good afternoon, everyone. I use Java Lang. StringBuilder to store some roles I don't know how many roles I want to store in advance, except:
>60% of the time, it's just (exactly) 7 characters > 39% of the time, (about) 3500 characters > 1% of the time, about 20K characters
How do we calculate the optimal initial buffer length that should be used?
I am currently using the new Java Lang. StringBuilder (4000) but it's just because I was too lazy to think before
Solution
There are two factors: time and memory consumption Time is mainly called by Java lang.AbstractStringBuilder. The number of expandcapacity() Of course, the cost of each call is linear with the current size of the buffer, but I simplify and calculate them here:
Quantity (time) of expandcapacity()
Default configuration (16 character capacity)
>In 60% of cases, StringBuilder will expand 0 times > in 39% of cases, StringBuilder will expand 8 times > in 1% of cases, StringBuilder will expand 11 times
The expected number of expandcapacity is 3,23
The initial capacity is 4096 characters
>In 99% of cases, StringBuilder will expand 0 times > in 1% of cases, StringBuilder will expand 3 times
The expected number of expandcapacities is 0,03
As you can see, the second case seems to be much faster because it rarely needs to extend StringBuilder (three times every 100 inputs) Note, however, that the first expansion is less important (copying a small amount of memory); In addition, if you add strings to the builder with huge blocks, it will expand more enthusiastically in fewer iterations
On the other hand, memory consumption increased:
Memory consumption
Default configuration (16 character capacity)
>In 60% of cases, StringBuilder will occupy 16 characters > in 39% of cases, StringBuilder will occupy 4K characters > in 1% of cases, StringBuilder will occupy 32K characters
The expected average memory consumption is 1935 characters
The initial capacity is 4096 characters
>In 99% of cases, StringBuilder will occupy 32K characters
The expected average memory consumption is: 4383 characters
TL; DR
This leads me to believe that expanding the initial buffer to 4K will more than double the memory consumption and accelerate the program by two orders of magnitude
The bottom line is: try! It is not difficult to write a benchmark that can handle millions of strings of different lengths with different initial capacities But I believe that a larger buffer may be a good choice