Java parsing XML file encounters special symbols & Solutions for exceptions

Wen / Zhu Jiqian

During the development of Java parsing XML files, when Sax parsing is used, such an exception message appears:

Error on line 60 of document: reference to entity "XXX" must be ';' End of separator;

After I opened the XML file, I found that the "XXX" symbol was followed by a "&" symbol. Later, I learned that this kind of symbol belongs to a special symbol in XML. If the special symbol is not represented by an escape symbol and is directly used in the XML file, strange exceptions will appear when parsing using sax and other methods.

In fact, this is because of these special characters.

Special symbols in XML include < > & '". They are PCDATA that are not allowed to be used as XML files. If you want to use them, you need to replace them with escape characters:

&lt;    <
&gt;    >
&amp;   &
&quot;  "
&apos;  '

So, how to use escape character replacement if you want to read XML file data normally?

At first, I wanted to know how to solve Baidu, but I found that many posts were several years ago, and I didn't write a clear solution. Most of them mentioned that it was the analysis exception caused by special symbols, but how to filter it out was vague. Therefore, I had to make a mess and come up with a more appropriate scheme to filter special characters.

The implementation idea is actually very simple. Before reading the XML file and using Sax for parsing, we can first read the XML file through the reader, then read it by line and splice it into a string string, and then use the string replacement method replaceall() to replace the special symbols. After replacement, we can directly convert the XML in the form of string into a document object for XML parsing:

  String xmlStr=s.replaceAll("&","&amp;");

The conversion method code is as follows:

  StringBuffer buffer = new StringBuffer();
  BufferedReader bf= new BufferedReader(new FileReader("D:\\测试.xml"));
  String s = null;
     while((s = bf.readLine())!=null){
     buffer.append(s.trim());
  }

  String str = buffer.toString();
  //在这一步进行字符替换,替换成合法转义字符
  String xml=str.replaceAll("&","&amp;");

  //这里就可以将处理过的xml文件进行读取解析了
  Document document =  DocumentHelper.parseText(xml);

At this point, you can solve the problem of special symbols & exceptions when Java parses XML files.

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>