Java – how to parse XML files containing BOMs?

I want to use JDOM to parse XML files from URLs But when trying this:

SAXBuilder builder = new SAXBuilder();
builder.build(aUrl);

I get this exception:

Invalid byte 1 of 1-byte UTF-8 sequence.

I think this may be a BOM problem So I looked at the source code and saw the BOM at the beginning of the file I try to use aURL Openstream() reads from the URL and deletes the BOM using common IO bominputstream But to my surprise, it didn't detect any BOM I try to read and write local files from the stream and parse local files I set all the encoding of inputstreamreader and outputstreamwriter to utf8, but it has crazy characters when I open the file

I think the problem is the source URL encoding But when I open the URL in the browser, save the XML in a file and read the file through the above process, everything is normal

I am grateful for the possible causes of this problem

Solution

The HTTP server is sending content as gzip (content encoding: gzip; if you don't know what this means, see http://en.wikipedia.org/wiki/HTTP_compression )Therefore, you need to add aURL Openstream () is wrapped in gzipinputstream, which will unzip it for you For example:

builder.build(new GZIPInputStream(aUrl.openStream()));

Edit and add according to the following comments: if you don't know whether the URL is gzipped in advance, you can write as follows:

private InputStream openStream(final URL url) throws IOException
{
    final URLConnection cxn = url.openConnection();
    final String contentEncoding = cxn.getContentEncoding();
    if(contentEncoding == null)
        return cxn.getInputStream();
    else if(contentEncoding.equalsIgnoreCase("gzip")
               || contentEncoding.equalsIgnoreCase("x-gzip"))
        return new GZIPInputStream(cxn.getInputStream());
    else
        throw new IOException("Unexpected content-encoding: " + contentEncoding);
}

(warning: untested) then use:

builder.build(openStream(aUrl.openStream()));

. This is basically the same as the above – aURL Openstream() is explicitly recorded as aURL openConnection(). Short for getinputstream() - except that it checks the content encoding header before deciding whether to wrap the stream in gzipinputstream

See the document for Java net. URLConnection.

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>