Hand in hand teaching Android parsing HTML with jsoup

1. Introduction to jsoup

Many times, we need to grab data from various web pages, and jsoup is a Java HTML parser that can directly parse a URL address and HTML text content. It provides a very labor-saving API, which can fetch and manipulate data through DOM, CSS and operation methods similar to jQuery.

Official jsoup documentation: https://jsoup.org/cookbook/

2. Usage scenario

The following is a screenshot of food. You can notice that this is an HTML page. When we want to capture the data inside, jsoup can help us a lot.

Next, start hand-in-hand teaching

First of all, a very important step is to download the jar package and throw it into LIBS

Jar package download address: http://jsoup.org/download

Android studio players can join gradle without downloading the jar package

Then, find your favorite web page to grab the data

Here, we continue to use the food web page, and then right-click to view the web source code, or press F12. Next, we can see a lot of labels:

Find what you need, such as the "food world" in the figure above. You can see that the "food world" is placed in < a title = "food world" with < div class = "top bar" id = "j_top_bar" > as the node. To obtain the "food world", the code can be written as follows:

Next, take a look at the printed results:

The jsup. Connect (string URL) method loads a document object from a URL. If an error occurs when getting HTML from this URL, an IOException will be thrown and should be handled appropriately.

Once you have a document, you can use the appropriate methods in the document or the methods in its parent classes element and node to get the relevant data.

Many articles are about a lot of principles, and then release a simple example, just like I simply typed a log above, and then found that it is not so simple to use. In order that you can use it directly without reading the document (and you can also use it without understanding a lot of tags), I decided to give another example (in fact, more logs than the above):

The red boxes in the figure below are the data we want to obtain. We can see that their corresponding nodes are < div class = "XXX" > in the blue circle

No more nonsense, code

It's done. Let's look at the log

no problem! Then the teaching can be over!

be careful:

The jsup. Connect (string URL) method cannot run on the main thread, or networkonmainthreadexception will be reported

Finally, the last rendering applied to the project:

Have you found the familiar spicy chicken? It's cool. You have wood!

Summary

The whole class is divided into several steps:

1. Download the jar package and throw it to LIBS (or gradle)

2. Find your favorite web page

3. Use jsup. Connect() to get the document of the web page

4. Check the web source code, aim at the place you want, and give him an element. Select (string selector)

5. Use node. Attr (string key) or element. Text () to extract data

6. No, it's that simple!

The above is the whole content of this article. I hope it will be helpful to your study, and I hope you can support programming tips.

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>