What methods can be used to return valid and invalid XML data from files in Java?

I have the following data, which should be XML:

<?xml version="1.0" encoding="UTF-8"?>
<Product>
    <id>1</id>
    <description>A new product</description>
    <price>123.45</price>
</Product>

<Product>
    <id>1</id>
    <description>A new product</description>
    <price>123.45</price>
</Product>

<ProductTTTTT>
    <id>1</id>
    <description>A new product</description>
    <price>123.45</price>
</Product>

<Product>
    <id>1</id>
    <description>A new product</description>
    <price>123.45</price>
</ProductAAAAAA>

So, basically, I have multiple root elements (products)

The key is that I am trying to convert these data into two XML documents, one for valid nodes and one for invalid nodes

Valid nodes:

<Product>
   ...
</Product>

Invalid node: < productttttt >... < / Product > and < Product >... < / productaaaaa >

Then I was thinking about how to use Java (not the web) to achieve this goal

>If I'm not wrong, using XSD to validate it will invalidate the entire file, so it's not an option. > Using the default JAXB parser (unmarshaller) will result in the above project because it internally creates the XSD of my entity. > Using only XPath (as far as I know) will only return the whole file. I haven't found a method like get! Valid (this is just to explain...) > use XQuery (possible?) By the way, how do you use XQuery with JAXB? > XSL (T) will cause the same thing on XPath because it uses XPath to select content

So... What method can I use to achieve my goal? (if possible, please provide a link or code)

Solution

If the file contains lines with start and end tags beginning with "product", you can:

>As long as the line starts with < product or < / product, use the file scanner to split the document into individual parts > try to parse the extracted text into XML using the XML API

>If successful, add the object to the list of well formed XML documents

>Then perform any other schema validation or validity checks

>If it throws a parsing error, grab it and add the text fragment to the list of "bad" items that need to be cleaned up or otherwise handled

An example to get you started:

package com.stackoverflow.questions.52012383;

import org.w3c.dom.Document;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;

import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.StringReader;

import java.util.ArrayList;
import java.util.List;
import java.util.Scanner;

public class FileSplitter {

    public static void parseFile(File file,String elementName) 
      throws ParserConfigurationException,IOException {

        List<Document> good = new ArrayList<>();
        List<String> bad = new ArrayList<>();

        String start-tag = "<" + elementName;
        String end-tag = "</" + elementName;
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder;
        StringBuffer buffer = new StringBuffer();
        String line;
        boolean append = false;

        try (Scanner scanner = new Scanner(file)) {
            while (scanner.hasNextLine()) {
                line = scanner.nextLine();

                if (line.startsWith(startTag)) {
                    append = true; //start accumulating content
                } else if (line.startsWith(endTag)) {
                    append = false;
                    buffer.append(line); 
                    //instead of the line above,you Could hard-code the ending tag to compensate for bad data:
                    // buffer.append(endTag + ">");

                    try { // to parse as XML
                        builder = factory.newDocumentBuilder();
                        Document document = builder.parse(new InputSource(new StringReader(buffer.toString())));
                        good.add(document); // parsed successfully,add it to the good list

                        buffer.setLength(0); //reset the buffer to start a new XML doc

                    } catch (SAXException ex) {
                        bad.add(buffer.toString()); // something is wrong,not well-formed XML
                    }
                }

                if (append) { // accumulate content
                    buffer.append(line);
                }
            }
            System.out.println("Good items: " + good.size() + " Bad items: " + bad.size());
            //do stuff with the good/bad results...
        }
    }

    public static void main(String args[]) 
      throws ParserConfigurationException,IOException {
        File file = new File("/tmp/test.xml");
        parseFile(file,"Product");
    }

}
The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>