Java – ignore the saxexception “content not allowed in trailing section”

I'm using Java's documentbuilder Parse (InputStream) to parse XML documents Occasionally, I get malformed XML documents because there will be additional garbage after the final > Cause saxexception: content is not allowed in trailing parts (in my case, garbage is just one or more empty bytes.)

I don't care what comes after the last Is there a simple way to parse the entire XML document in Java and let it ignore any trailing garbage?

Note that by "ignore" I don't just mean catching and ignoring exceptions: I mean ignoring trailing garbage, not throwing any exceptions, and returning the document object, because the XML includes the final > validated

Solution

Since your sender provided you with invalid XML, if you want to avoid this exception, you need to correct it before it reaches the parser If you cannot correct the sender, some preprocessing step is required

If it's just that you have an extra empty byte after the end tag, as defined by a reply to another answer, this may be a byte that you can easily implement by wrapping the input stream in your implemented filterinputstream that skips null

If the problem is more complex than null characters, you certainly need a more complex filter, which may be difficult

If you are using contenthandler, you can add a callback to it so that it can notify the calling code when processing the end bundle root tag, and based on this knowledge, the calling code can have abnormal logic in its handler. If the end is signaled, simply ignore it At that time, anything that the parser must do may have been done! But this solution doesn't seem to work for you

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>