Java file I / O throughput drops
I have a program in which each thread reads multiple lines from a file at once, processes lines, and writes lines to different files Four threads split the list of files to be processed in I encountered strange performance problems in two cases:
>Four files, 50000 lines each
>The throughput starts from 700 lines / s and decreases to ~ 100 lines / s
>30000 files, 12 lines each
>Throughput starts at 800 lines / s and remains stable
This is the internal software I'm working on, so unfortunately I can't share any source code, but the main steps of the program are:
>Split file list in four worker threads > start all threads. > The thread can read up to 100 rows at a time and store them in the string [] array. > Thread applies the transformation to all rows in the array. > The thread writes the line to the file (unlike the input file) Each thread repeats 3 - 5 times until all files are processed completely
What I don't understand is why 30K files with 12 lines per line have higher performance than some multiline files I thought the cost of opening and closing files was greater than that of reading a single file In addition, the decline in the performance of the previous case is exponential
I have set the maximum heap size to 1024 MB and use up to 100 MB, so overloaded GC is not a problem Do you have any other suggestions?
Solution
According to your figures, I guess GC may not be a problem I suspect this is the normal behavior of the disk, operated by many concurrent threads When the file is large, the disk must switch the context between threads many times (generating important disk seek time), and the overhead is obvious For small files, they may be read as a single block without additional seek time, so threads will not interfere with each other too much
When using a single standard disk, serial IO is usually better than parallel io