Java – Sax parsing and coding

One of my contacts encountered Sax problems parsing RSS and atom files According to him, it's like text from an item element is truncated into apostrophes or sometimes accented characters There seems to be a problem with coding

I have tried Sax, and I have some truncation, but I haven't explored it further I would appreciate it if someone had solved this problem before

This is the code used in contenthandler:

public void characters( char[],int start,int end ) throws SAXException {
//
    link = new String(ch,start,end);

Edit: the encoding problem may be due to storing information in a byte array because I know Java works in Unicode

Solution

The characters () method is not guaranteed to provide you with the full character content of a text element in one pass - the full text may cross the buffer boundary You need to buffer your own characters between the start and end element events

for example

StringBuilder builder;

public void startElement(String uri,String localName,String qName,Attributes atts) {
   builder = new StringBuilder();
}

public void characters(char[] ch,int length) {
   builder.append(ch,length);
}

public void endElement(String uri,String qName) {
  String theFullText = builder.toString();
}
The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>