How to quickly search for strings in large files in Java?

2020-04-16 • Java

I'm trying to search a large text file (400MB) for a specific string using the following:

File file = new File("fileName.txt");
try {
    int count = 0;
    Scanner scanner = new Scanner(file);
    while(scanner.hasNextLine()) {
        if(scanner.nextLine().contains("particularString")) {
            count++;
            System.out.println("Number of instances of String: " + count);
        }
    }
} catch (FileNotFoundException e){
    System.out.println(e);
}

This applies to small files, but it takes too long (> 10 minutes) for this particular file and other large files

What is the fastest and most effective way to do this?

I will now change to the following and complete it in a few seconds –

try {
        int count = 0;
        FileReader fileIn = new FileReader(file);
        BufferedReader reader = new BufferedReader(fileIn);
        String line;
        while((line = reader.readLine()) != null) {
            if((line.contains("particularString"))) {
                count++;
                System.out.println("Number of instances of String " + count);
            }
        }
    }catch (IOException e){
        System.out.println(e);
    }

Solution

First, find out how long it takes to actually read the entire file content and how long it takes to scan mode

If your results are dominated by reading time (and assuming you read correctly, then the channel or at least the buffered reader) there is nothing to do

If it takes up your scanning time, you can read all the rows, and then send the small batch of rows to be searched to the work queue, where you can have multiple threads pick up the row batch and search in it

Stadium data

>Assuming a hard disk reading speed of 50 MB / S (slow by modern standards), you should be able to read the entire file into memory in < 10 seconds. > Looking at the MD5 hash speed benchmark (for example, here) shows us that the hash rate is at least as fast as the disk read speed (usually faster) In addition, string search is faster, simpler and better parallelized than hash

Considering these two estimates, I think the correct implementation can easily provide you with a running time of about 10 seconds (if you start the search job when reading the row batch), which is mainly determined by your disk reading time

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.

THE END

Java

二维码

Java – implement filter class loader

< <上一篇

Android – oncreateview does not use fragment call in viewpager

下一篇>>

搜索内容

How to quickly search for strings in large files in Java?

Solution

热门文章