Java application open source framework to realize simple web search engine
introduction
Using the open source library of Java, write a search engine that can crawl the content of a website. And according to the web content, we can crawl deeply to obtain all relevant web addresses and contents. Users can search all relevant web sites through keywords.
Specific functions
(1) Users can specify the content of the web page corresponding to a URL. (2) Parse the web page content and get all the URL link addresses. (3) The user can set the crawl depth, which means that starting from the page corresponding to the initial URL, the user can crawl the URL in the web page corresponding to all the URLs, and so on. The greater the depth, the more websites you can climb. (4) Save and index the crawled URL content. The content of the index is the URL address itself and the page title corresponding to the URL. (5) Users can search the web address through keywords to find the URL address with the keyword. (6) The process of establishing index and searching index can intelligently identify Chinese keywords and segment keywords. (7) Users can specify the address where the index is saved, the initial URL, the crawl depth, the keywords to search, and the maximum matches.
Open source framework
Source code
Crawler part: spider java
Build index: buildindex java
Search index
UI interface (here, for convenience, it is only in the form of command line, and a GUI interface can be written according to requirements)
The above is the whole content of this article. I hope it will be helpful to your study, and I hope you can support programming tips.