Data acquisition based on Java (I)
Previously, I wrote 2 articles on PHP data collection and warehousing:
Data collection and warehousing based on PHP (I): http://www.cnblogs.com/lichenwei/p/3872307.html
Data collection and warehousing based on PHP (II): http://www.cnblogs.com/lichenwei/p/3873281.html
Java based data collection (II): http://www.cnblogs.com/lichenwei/p/3905370.html
Data collection and warehousing based on Java (III): http://www.cnblogs.com/lichenwei/p/3907007.html
Data collection and warehousing based on Java (final part): http://www.cnblogs.com/lichenwei/p/3910492.html
In fact, the principle of collection is the same: remote access to information - > extract the required content (regular) - > classified storage - > read - > display
It doesn't matter what programming language you use. Programming language is just a tool
This time, let's collect data from a football website: http://www.footballresults.org/league.php?league=EngDiv1
The following figure shows the data we want to collect:
OK, let's look at the above two articles on the acquisition principle. The rest are directly related to the code:
GerData. Java (data collection method encapsulation)
In fact, it is a simple matching rule:
Group (): returns the input subsequence captured by the given group during the previous matching operation.
Find (): attempts to find the next subsequence of the input sequence that matches the pattern.
1 package com.lcw.curl;
CurlMain. Java (main program)
Inputstreamreader () is a bridge between byte flow and character flow.
Inputstreamreader () is a bridge between byte flow and character flow.
Openstream () opens the connection to this URL and returns a byte stream for reading from the connection.
Data collection is easy, and the effect is shown in the figure below: