What methods can be used to return valid and invalid XML data from files in Java?
I have the following data, which should be XML:
<?xml version="1.0" encoding="UTF-8"?> <Product> <id>1</id> <description>A new product</description> <price>123.45</price> </Product> <Product> <id>1</id> <description>A new product</description> <price>123.45</price> </Product> <ProductTTTTT> <id>1</id> <description>A new product</description> <price>123.45</price> </Product> <Product> <id>1</id> <description>A new product</description> <price>123.45</price> </ProductAAAAAA>
So, basically, I have multiple root elements (products)
The key is that I am trying to convert these data into two XML documents, one for valid nodes and one for invalid nodes
Valid nodes:
<Product> ... </Product>
Invalid node: < productttttt >... < / Product > and < Product >... < / productaaaaa >
Then I was thinking about how to use Java (not the web) to achieve this goal
>If I'm not wrong, using XSD to validate it will invalidate the entire file, so it's not an option. > Using the default JAXB parser (unmarshaller) will result in the above project because it internally creates the XSD of my entity. > Using only XPath (as far as I know) will only return the whole file. I haven't found a method like get! Valid (this is just to explain...) > use XQuery (possible?) By the way, how do you use XQuery with JAXB? > XSL (T) will cause the same thing on XPath because it uses XPath to select content
So... What method can I use to achieve my goal? (if possible, please provide a link or code)
Solution
If the file contains lines with start and end tags beginning with "product", you can:
>As long as the line starts with < product or < / product, use the file scanner to split the document into individual parts > try to parse the extracted text into XML using the XML API
>If successful, add the object to the list of well formed XML documents
>Then perform any other schema validation or validity checks
>If it throws a parsing error, grab it and add the text fragment to the list of "bad" items that need to be cleaned up or otherwise handled
An example to get you started:
package com.stackoverflow.questions.52012383; import org.w3c.dom.Document; import org.xml.sax.InputSource; import org.xml.sax.SAXException; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.parsers.ParserConfigurationException; import java.io.File; import java.io.FileNotFoundException; import java.io.IOException; import java.io.StringReader; import java.util.ArrayList; import java.util.List; import java.util.Scanner; public class FileSplitter { public static void parseFile(File file,String elementName) throws ParserConfigurationException,IOException { List<Document> good = new ArrayList<>(); List<String> bad = new ArrayList<>(); String start-tag = "<" + elementName; String end-tag = "</" + elementName; DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder; StringBuffer buffer = new StringBuffer(); String line; boolean append = false; try (Scanner scanner = new Scanner(file)) { while (scanner.hasNextLine()) { line = scanner.nextLine(); if (line.startsWith(startTag)) { append = true; //start accumulating content } else if (line.startsWith(endTag)) { append = false; buffer.append(line); //instead of the line above,you Could hard-code the ending tag to compensate for bad data: // buffer.append(endTag + ">"); try { // to parse as XML builder = factory.newDocumentBuilder(); Document document = builder.parse(new InputSource(new StringReader(buffer.toString()))); good.add(document); // parsed successfully,add it to the good list buffer.setLength(0); //reset the buffer to start a new XML doc } catch (SAXException ex) { bad.add(buffer.toString()); // something is wrong,not well-formed XML } } if (append) { // accumulate content buffer.append(line); } } System.out.println("Good items: " + good.size() + " Bad items: " + bad.size()); //do stuff with the good/bad results... } } public static void main(String args[]) throws ParserConfigurationException,IOException { File file = new File("/tmp/test.xml"); parseFile(file,"Product"); } }