Java – Sax parsing and coding
One of my contacts encountered Sax problems parsing RSS and atom files According to him, it's like text from an item element is truncated into apostrophes or sometimes accented characters There seems to be a problem with coding
I have tried Sax, and I have some truncation, but I haven't explored it further I would appreciate it if someone had solved this problem before
This is the code used in contenthandler:
public void characters( char[],int start,int end ) throws SAXException { // link = new String(ch,start,end);
Edit: the encoding problem may be due to storing information in a byte array because I know Java works in Unicode
Solution
The characters () method is not guaranteed to provide you with the full character content of a text element in one pass - the full text may cross the buffer boundary You need to buffer your own characters between the start and end element events
for example
StringBuilder builder; public void startElement(String uri,String localName,String qName,Attributes atts) { builder = new StringBuilder(); } public void characters(char[] ch,int length) { builder.append(ch,length); } public void endElement(String uri,String qName) { String theFullText = builder.toString(); }