java – Scanner. Findinline() has a large memory leak

I am running a simple scanner to parse a string, but I find that if I call it often, I will get OUTOFMEMORY error This code is called as part of the constructor of the object, which is repeatedly built for the string array:

Edit: This is the constructor for more information; Nothing more happened than the try catch of scanner

public Header(String headerText) {
        char[] charArr;
        charArr = headerText.tocharArray();
        // Check that all characters are printable characters
        if (charArr.length > 0 && !commonMethods.isPrint(charArr)) {
            throw new IllegalArgumentException(headerText);
        }
        // Check for header suffix
        Scanner sc = new Scanner(headerText);
        MatchResult res;
        try {
            sc.findInLine("(\\D*[a-zA-Z]+)(\\d*)(\\D*)");
            res = sc.match();
        } finally {
            sc.close();
        }

        if (res.group(1) == null || res.group(1).isEmpty()) {
            throw new IllegalArgumentException("Missing header keyword found");     // Empty header to store
        } else {
            mnemonic = res.group(1).toLowerCase();                            // Store header
        }
        if (res.group(2) == null || res.group(2).isEmpty()) {
            suffix = -1;
        } else {
            try {
                suffix = Integer.parseInt(res.group(2));       // Store suffix if it exists
            }  catch (NumberFormatException e) {
                throw new NumberFormatException(headerText);
            }
        }
        if (res.group(3) == null || res.group(3).isEmpty()) {
            isQuery= false;
        } else {
            if (res.group(3).equals("?")) {
                isQuery = true;
            } else {
                throw new IllegalArgumentException(headerText);
            }
        }

        // If command was of the form *ABC,reject suffixes and prefixes
        if (mnemonic.contains("*") 
                && suffix != -1) {
            throw new IllegalArgumentException(headerText);
        }
    }

Analyzer memory snapshot display scanner The read (char) method of findinline() allocates a lot of memory during the operation because I scanned hundreds of thousands of strings; After a few seconds, it has allocated more than 38mb

I think calling close () on the scanner after using it in the constructor will mark the old object to be cleared by the GC, but somehow it still exists, and the read method accumulates billions of bytes of data before filling the heap

Can anyone point me in the right direction?

Solution

You haven't released all the code yet, but since you're repeatedly scanning the same regular expressions, it's more effective to precompile static patterns and use them for scanner lookups:

static Pattern p = Pattern.compile("(\\D*[a-zA-Z]+)(\\d*)(\\D*)");

And in the constructor:

sc.findInLine(p);

This may or may not be the root cause of the oom problem, but it will certainly make your parsing faster

Related: Java util. regex – importance of Pattern. compile()?

Update: after you release more code, I will see some other problems If you call this constructor repeatedly, this means that you may mark or decompose the input in advance Why create a new scanner to parse each row? They are expensive; If possible, you should use the same scanner to parse the entire file Using a scanner with precompiled mode will be much faster than what you are doing now, which will create a new scanner and a new mode for each line you want to parse

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>