Java – parses the contents of an XML file without knowing the XML file structure
I have been using java to learn some new techniques to parse files, and I have been working hard on the part of msot However, I feel frustrated about how to parse the XML file to a place where the structure is unknown at the time of receipt For many examples, if you know the structure (getelementbytagname seems to be the way to go), but there are no dynamic options, at least not what I have found to do
So the TL of this problem; Dr version, how can I parse an XML file? I can't rely on knowing its structure?
Solution
The analysis part is very simple; As holderdarocha said in his comments, the parser only needs valid XML, and it doesn't care about structure You can use Java's standard documentbuilder to get documents:
InputStream in = new FileInputStream(...); Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(in);
(if you are parsing multiple documents, you can continue to reuse the same documentbuilder.)
You can then start with the root document element and use the familiar DOM methods from it:
Element root = doc.getDocumentElement(); // perform DOM operations starting here.
As for dealing with it, it really depends on what you want to do, but you can use methods like getfirstchild () and getnextsibling () to iterate children and processes. You can iterate according to structures, tags, and properties
Consider the following example:
import java.io.ByteArrayInputStream; import java.io.InputStream; import javax.xml.parsers.DocumentBuilderFactory; import org.w3c.dom.Document; import org.w3c.dom.Element; import org.w3c.dom.Node; public class XML { public static void main (String[] args) throws Exception { String xml = "<objects><circle color='red'/><circle color='green'/><rectangle>hello</rectangle><glumble/></objects>"; // parse InputStream in = new ByteArrayInputStream(xml.getBytes("utf-8")); Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(in); // process Node objects = doc.getDocumentElement(); for (Node object = objects.getFirstChild(); object != null; object = object.getNextSibling()) { if (object instanceof Element) { Element e = (Element)object; if (e.getTagName().equalsIgnoreCase("circle")) { String color = e.getAttribute("color"); System.out.println("It's a " + color + " circle!"); } else if (e.getTagName().equalsIgnoreCase("rectangle")) { String text = e.getTextContent(); System.out.println("It's a rectangle that says \"" + text + "\"."); } else { System.out.println("I don't kNow what a " + e.getTagName() + " is for."); } } } } }
The XML documents entered (for example, hard coded) are:
<objects> <circle color='red'/> <circle color='green'/> <rectangle>hello</rectangle> <glumble/> </objects>
Output is:
It's a red circle! It's a green circle! It's a rectangle that says "hello". I don't kNow what a glumble is for.