Android grabs a complete example of geek headlines on CSDN home page

Today, I wrote a little code. Grab the geek headlines on the home page. The effect is shown in the figure:

Share with novice friends.

main points:

1. Use Apache httpclient library to implement get request.

2. Asynchronous request processing.

3. Regular expressions capture the data they need.

1. Use Apache httpclient library to implement get request.

Using Apache is a simple three-step process

2. Asynchronous request processing.

The implementation of asynchronous requests is also very simple. A new thread is opened to execute request processing, and the request completes the data obtained by the handler in the main thread. See the code for details.

3. Regular expressions capture the data they need.

This is simpler. I recommend a tool, regextester, which is used in relevant documents.

Let me say here that even if you don't know any regular expressions, you just need to know (. *?). It allows you to capture almost all the data you need.

". *?" note that three characters together represent greedy matching of any number of arbitrary characters. It can be simply understood as any character.

If "a. *? B" matches the string "eabcd", it will find "ABCD", where ". *?" matches "BC".

The content we need to grab is generally represented by "(. *?)". Note that parentheses are included here. This is important. Use parentheses to indicate what we want to extract.

We specifically analyze the CSDN home page source code. Each of the following operations should be tested in regextester.

It's easy to find. We need to capture the initial format of the content

The content we want to grab is the title and URL address. Use (. *?) instead

In contrast to the above, the content we want to capture is replaced by (. *?). Here "\ 1" represents the content of the first (. *?). They are duplicates.

Similarly, if we use "\ 2", it will represent the same content as the second bracket. We don't use it here.

Passed the test with tools, found no problem, can find out.

Further simplification, we delete some content that is irrelevant to the positioning. In this step, we need to test to ensure that the matching content is the same as above.

We found that target = "_blank" onclick = "logclickcount (this, which is also found in other places, is a matching word of indistinguishable content, which we use. *? To ignore. Note that we do not use parentheses, but use parentheses to extract the content. Finally, we get a feature string. Through the following feature string, we can use it in many characters of the source code,

Extract what we want.

Note that the above content should be treated as a code string, and "\" should be added before each "quotation mark",

In the code is a very short piece of code:

The specific codes are as follows:

summary

The above is the full content of this article about Android capturing the complete example of geek headlines on CSDN home page. I hope it will be helpful to you. Interested friends can continue to refer to other related topics on this site. If there are deficiencies, please leave a message to point out. Thank you for your support!

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>