Java – how to ignore spaces when reading files to generate XML DOM
I'm trying to read a file to generate a DOM document, but the file has spaces and newlines. I try to ignore them, but I can't:
DocumentBuilderFactory docfactory=DocumentBuilderFactory.newInstance(); docfactory.setIgnoringElementContentWhitespace(true);
I saw in the Javadoc that the setignoreelementcontentwhitespace method only runs when the validation flag is enabled, but I don't have a DTD or XML schema for the document
What can I do?
to update
I don't like introducing myself's idea
Solution
'ignoreelementcontentwhitespace' is not about deleting all plain white text nodes, but only white nodes whose parents are described in the schema as having element content - that is, they contain only other elements and never text
If you do not use a schema (DTD or XSD), the element content defaults to mixed, so this parameter will never have any impact (unless the parser provides a non-standard DOM extension to treat all unknown elements as containing element content, as far as I know, the content available in Java will not.)
You can crack the document on the way to the parser to contain schema information, for example, by adding an internal subset to The Declaration contains Declaration, and then use the ignoreelementcontentwhitespace parameter
Or, perhaps more easily, you can delete blank nodes in post-processing or when using lsparserfilter