Hand in hand teaching Android parsing HTML with jsoup
1. Introduction to jsoup
Many times, we need to grab data from various web pages, and jsoup is a Java HTML parser that can directly parse a URL address and HTML text content. It provides a very labor-saving API, which can fetch and manipulate data through DOM, CSS and operation methods similar to jQuery.
Official jsoup documentation: https://jsoup.org/cookbook/
2. Usage scenario
The following is a screenshot of food. You can notice that this is an HTML page. When we want to capture the data inside, jsoup can help us a lot.
Next, start hand-in-hand teaching
First of all, a very important step is to download the jar package and throw it into LIBS
Jar package download address: http://jsoup.org/download
Android studio players can join gradle without downloading the jar package
Then, find your favorite web page to grab the data
Here, we continue to use the food web page, and then right-click to view the web source code, or press F12. Next, we can see a lot of labels:
Find what you need, such as the "food world" in the figure above. You can see that the "food world" is placed in < a title = "food world" with < div class = "top bar" id = "j_top_bar" > as the node. To obtain the "food world", the code can be written as follows:
Next, take a look at the printed results:
The jsup. Connect (string URL) method loads a document object from a URL. If an error occurs when getting HTML from this URL, an IOException will be thrown and should be handled appropriately.
Once you have a document, you can use the appropriate methods in the document or the methods in its parent classes element and node to get the relevant data.
Many articles are about a lot of principles, and then release a simple example, just like I simply typed a log above, and then found that it is not so simple to use. In order that you can use it directly without reading the document (and you can also use it without understanding a lot of tags), I decided to give another example (in fact, more logs than the above):
The red boxes in the figure below are the data we want to obtain. We can see that their corresponding nodes are < div class = "XXX" > in the blue circle
No more nonsense, code
It's done. Let's look at the log
no problem! Then the teaching can be over!
be careful:
The jsup. Connect (string URL) method cannot run on the main thread, or networkonmainthreadexception will be reported
Finally, the last rendering applied to the project:
Have you found the familiar spicy chicken? It's cool. You have wood!
Summary
The whole class is divided into several steps:
1. Download the jar package and throw it to LIBS (or gradle)
2. Find your favorite web page
3. Use jsup. Connect() to get the document of the web page
4. Check the web source code, aim at the place you want, and give him an element. Select (string selector)
5. Use node. Attr (string key) or element. Text () to extract data
6. No, it's that simple!
The above is the whole content of this article. I hope it will be helpful to your study, and I hope you can support programming tips.