Handling large string lists in Java

I have a task that I have to pass billions of strings and check whether each is unique All lines themselves cannot be accommodated in the RAM memory of the PC In addition, the number of rows may be greater than integer MAX_ VALUE.

I assume that the best way to handle this amount of data is to put the hash code of each string into some kind of hashtable

So this is my question:

>What should I use instead of string hashCode()? (the return value is int, but I may take a long time) > what is the fastest method / framework for using a list of this size? What I need most is to be able to quickly check whether the list contains elements

Solution

You are thinking about this problem. All this can be done through a MySQL table, which saves the data to disk instead of saving everything in memory So much data has never been effectively processed by independent applications

CREATE TABLE TONS_OF_STRINGS
(
  unique_string varchar(255) NOT NULL,UNIQUE (unique_string)
)

Just loop through the values (assuming a comma - separated list here) and try to insert each tag Each failed token is a duplicate

public static void main(args) {
  Connection con = DriverManager.getConnection("jdbc:MysqL://localhost/database","username","password");
  FileReader file = new FileReader("SomeGiantFile.csv");
  Scanner scan = new Scanner(file);
  scan.useDelimiter(",");
  String token;
  while ( scan.hasNext() ) {
    token = scan.next();
    try {
      PreparedStatement ps = con.prepareStatement("Insert into TONS_OF_STRING (UNIQUE_STRING) values (?)");
      ps.setString(1,token);
      ps.executeUpdate();
    } catch (sqlException e) {
      System.out.println("Found duplicate: " + token );
    }
  }
  con.close();
  System.out.println("Well that was easy,I'm all done!");
  return 0;
}

Don't forget to clear the table after completion. This is a lot of data

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>