Java – strange behavior when downloading HTML using httpurlconnection
•
Java
In my android Wikipedia reader application, I'm using httpurlconnection to download the HTML of the article. Some users report that they can't see the article, but see some CSS, so it seems that their operators preprocess the HTML in some way before downloading, while other Wikipedia readers seem to work normally
Sample web address: http://en.m.wikipedia.org/wiki/Black_Moon_ (album)
My approach:
public static String downloadString(String url) throws Exception { StringBuilder downloadedHtml = new StringBuilder(); HttpURLConnection urlConnection = null; String line = null; BufferedReader rd = null; try { URL targetUrl = new URL(url); urlConnection = (HttpURLConnection) targetUrl.openConnection(); if (url.toLowerCase().contains("/special")) urlConnection.setInstanceFollowRedirects(true); else urlConnection.setInstanceFollowRedirects(false); //read the result from the server rd = new BufferedReader(new InputStreamReader(urlConnection.getInputStream())); while ((line = rd.readLine()) != null) downloadedHtml.append(line + '\n'); } catch (Exception e) { AppLog.e("An exception occurred while downloading data.\r\n: " + e); e.printStackTrace(); } finally { if (urlConnection != null) { AppLog.i("Disconnecting the http connection"); urlConnection.disconnect(); } if (rd != null) rd.close(); } return downloadedHtml.toString(); }
I can't reproduce this problem, but must there be a way to solve it? I even disabled redirection by setting setinstancefollowredirects to 'false', but it didn't help
Did I miss anything?
Examples of user reports:
http://pastebin.com/1E3Hn2yX
Solution
Use HTTPS to prevent operators from rewriting pages (no reference)
Not what I can see
The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
二维码