Java – filechannel ByteBuffer and hashing files

2019-12-18 • Java

I built a file hash method in Java, which accepts the input string representation of file path and file name, and then calculates the hash value of the file Hashing can be any natively supported Java hashing algorithm, such as MD2 to sha-512

I'm trying to find the last drop of performance because this method is an integral part of the project I'm studying I was advised to try using filechannel instead of the regular FileInputStream

My original method:

/**
     * Gets Hash of file.
     * 
     * @param file String path + filename of file to get hash.
     * @param hashAlgo Hash algorithm to use. <br/>
     *     Supported algorithms are: <br/>
     *     MD2,MD5 <br/>
     *     SHA-1 <br/>
     *     SHA-256,SHA-384,SHA-512
     * @return String value of hash. (Variable length dependent on hash algorithm used)
     * @throws IOException If file is invalid.
     * @throws HashTypeException If no supported or valid hash algorithm was found.
     */
    public String getHash(String file,String hashAlgo) throws IOException,HashTypeException {
        StringBuffer hexString = null;
        try {
            MessageDigest md = MessageDigest.getInstance(validateHashType(hashAlgo));
            FileInputStream fis = new FileInputStream(file);

            byte[] dataBytes = new byte[1024];

            int nread = 0;
            while ((nread = fis.read(dataBytes)) != -1) {
                md.update(dataBytes,nread);
            }
            fis.close();
            byte[] mdbytes = md.digest();

            hexString = new StringBuffer();
            for (int i = 0; i < mdbytes.length; i++) {
                hexString.append(Integer.toHexString((0xFF & mdbytes[i])));
            }

            return hexString.toString();

        } catch (NoSuchAlgorithmException | HashTypeException e) {
            throw new HashTypeException("Unsuppored Hash Algorithm.",e);
        }
    }

Refactoring method:

/**
     * Gets Hash of file.
     * 
     * @param file String path + filename of file to get hash.
     * @param hashAlgo Hash algorithm to use. <br/>
     *     Supported algorithms are: <br/>
     *     MD2,SHA-512
     * @return String value of hash. (Variable length dependent on hash algorithm used)
     * @throws IOException If file is invalid.
     * @throws HashTypeException If no supported or valid hash algorithm was found.
     */
    public String getHash(String fileStr,HasherException {

        File file = new File(fileStr);

        MessageDigest md = null;
        FileInputStream fis = null;
        FileChannel fc = null;
        ByteBuffer bbf = null;
        StringBuilder hexString = null;

        try {
            md = MessageDigest.getInstance(hashAlgo);
            fis = new FileInputStream(file);
            fc = fis.getChannel();
            bbf = ByteBuffer.allocate(1024); // allocation in bytes

            int bytes;

            while ((bytes = fc.read(bbf)) != -1) {
                md.update(bbf.array(),bytes);
            }

            fc.close();
            fis.close();

            byte[] mdbytes = md.digest();

            hexString = new StringBuilder();

            for (int i = 0; i < mdbytes.length; i++) {
                hexString.append(Integer.toHexString((0xFF & mdbytes[i])));
            }

            return hexString.toString();

        } catch (NoSuchAlgorithmException e) {
            throw new HasherException("Unsupported Hash Algorithm.",e);
        }
    }

Both return the correct hash value, but the refactoring method seems to work only with small files When I pass in a large file, it completely suffocates and I can't figure out why I'm new to NiO, so please give me some advice

Editor: forgot to mention that I'm throwing Sha - 512 through it for testing

Update: update using my current method

/**
     * Gets Hash of file.
     * 
     * @param file String path + filename of file to get hash.
     * @param hashAlgo Hash algorithm to use. <br/>
     *     Supported algorithms are: <br/>
     *     MD2,HasherException {

        File file = new File(fileStr);

        MessageDigest md = null;
        FileInputStream fis = null;
        FileChannel fc = null;
        ByteBuffer bbf = null;
        StringBuilder hexString = null;

        try {
            md = MessageDigest.getInstance(hashAlgo);
            fis = new FileInputStream(file);
            fc = fis.getChannel();
            bbf = ByteBuffer.allocateDirect(8192); // allocation in bytes - 1024,2048,4096,8192

            int b;

            b = fc.read(bbf);

            while ((b != -1) && (b != 0)) {
                bbf.flip();

                byte[] bytes = new byte[b];
                bbf.get(bytes);

                md.update(bytes,b);

                bbf.clear();
                b = fc.read(bbf);
            }

            fis.close();

            byte[] mdbytes = md.digest();

            hexString = new StringBuilder();

            for (int i = 0; i < mdbytes.length; i++) {
                hexString.append(Integer.toHexString((0xFF & mdbytes[i])));
            }

            return hexString.toString();

        } catch (NoSuchAlgorithmException e) {
            throw new HasherException("Unsupported Hash Algorithm.",e);
        }
    }

So I try to benchmark MD5 of 2.92gb file with my original example and my latest updated example Of course, any benchmark is relative, because there are operating system and disk cache and other "magic" that will cause repeated reading of the same file... But here are some benchmarks I load each method and close it five times after compiling it as fresh The benchmark is taken from the last (5th) run because it will be the "hottest" run of the algorithm, as well as any "magic" (in my theory anyway)

Here's the benchmarks so far: 

    Original Method - 14.987909 (s) 
    Latest Method - 11.236802 (s)

The time spent distributing the same 2.92gb file is reduced by 25.03% very nice.

Solution

3. Suggestions:

1) Clear buffer after each read

while (fc.read(bbf) != -1) {
    md.update(bbf.array(),bytes);
    bbf.clear();
}

2) Do not turn off FC and FIS. This is redundant. Turning off FIS is enough FileInputStream. The close API says:

If this stream has an associated channel then the channel is closed as well.

3) If you want to use filechannel to improve performance

ByteBuffer.allocateDirect(1024);

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.

THE END

Java

二维码

Use of deprecated and suppress annotations in kotlin basic learning

< <上一篇

JPA: use @ generatedvalue on non ID columns

下一篇>>

搜索内容

Java – filechannel ByteBuffer and hashing files

Solution

热门文章