Java – use VTD-XML to optimize the speed of parsing XML files

I am using VTD - XML to parse a large number of XML files I'm not sure if I used the tool correctly - I think so, but parsing the file took me too long

XML file (dataxii format) is a compressed file on HD Open the package, they are about 31MB and contain more than 850.000 lines of text I just need to extract a few fields and store them in the database

import org.apache.commons.lang3.math.NumberUtils;
...

private static void test(File zipFile) throws XPathEvalException,NavException,XPathParseException {
    // init timer
    long step1=System.currentTimeMillis();

    // open file to output extracted fragments
    VTDGen vg = new VTDGen();
    vg.parseZIPFile(zipFile.getAbsolutePath(),zipFile.getName().replace(".zip",".xml"),true);

    VTDNav vn = vg.getNav();

    AutoPilot apSites = new AutoPilot();
    apSites.declareXPathNameSpace("ns1","http://schemas.xmlsoap.org/soap/envelope/");
    apSites.selectXPath("/ns1:Envelope/ns1:Body/d2LogicalModel/payloadPublication/siteMeasurements");
    apSites.bind(vn);

    long step2=System.currentTimeMillis();
    System.out.println("Prep took "+(step2-step1)+"ms; ");

    // init variables
    String siteID,timeStr;
    boolean reliable;
    int index,flow,ctr=0;
    short speed;
    while(apSites.evalXPath()!=-1) {

        vn.toElement(VTDNav.FIRST_CHILD,"measurementSiteReference");
        siteID = vn.toString(vn.getText());

        // loop all measured values of this measurement site
        while(vn.toElement(VTDNav.NEXT_SIBLING,"measuredValue")) {
            ctr++;

            // extract index attribute
            index = NumberUtils.toInt(vn.toString(vn.getAttrVal("index")));

            // go one level deeper into basicDataValue
            vn.toElement(VTDNav.FIRST_CHILD,"basicDataValue");

            // we need either FIRST_CHILD or NEXT_SIBLING depending on whether we find something
            int next = VTDNav.FIRST_CHILD;
            if(vn.toElement(next,"time")) {
                timeStr = vn.toString(vn.getText());
                next = VTDNav.NEXT_SIBLING;
            }

            if(vn.toElement(next,"averageVehicleSpeed")) {
                speed = NumberUtils.toShort(vn.toString(vn.getText()));
                next = VTDNav.NEXT_SIBLING;
            }

            if(vn.toElement(next,"vehicleFlow")) {
                flow = NumberUtils.toInt(vn.toString(vn.getText()));
                next = VTDNav.NEXT_SIBLING;
            }

            if(vn.toElement(next,"fault")) { 
                reliable = vn.toString(vn.getText()).equals("0");
            }

            // insert into database here...

            if(next==VTDNav.NEXT_SIBLING) {
                vn.toElement(VTDNav.PARENT);
            }
            vn.toElement(VTDNav.PARENT);
        }

    }
    System.out.println("Loop took "+(System.currentTimeMillis()-step2)+"ms; ");
    System.out.println("Total number of measured values: "+ctr);
}

The output of the above function of my XML file is:

Prep took 25756ms; 
Loop took 26889ms; 
Total number of measured values: 112611

There is no data actually inserted into the database The problem now is that I receive one such file every minute The total parsing time is now close to 1 minute, because it takes about 10 seconds to download the file. I need to store the data in the database, and I am running in real time now

Is there any way to speed up? What I tried didn't help:

>Using autopilots for all fields actually slows down the second step by 30000 MS > decompress the file yourself and parse the byte array into VTD, which makes no difference > loop the file yourself using BufferedReader readline(), but it's not fast enough

Does anyone see the possibility of speeding up, or do I need to start thinking about heavier machines / multithreading? Of course, there are a lot of 850.000 lines per minute (1.2 billion lines per day), but I still think it shouldn't take a minute to parse 31MB data

Solution

You can try to unzip the folder immediately and store the value of each XML file in an array

File[] files = new File("foldername").listFiles();

Then you can make a loop through each file. I'm not sure if it will speed up, but it's worth trying

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>