Java – extract main content (highest text density) web pages from news articles

I want to make a code to extract the main news from the news website News websites contain major news, advertisements, comments and copyright notices, so I want to get major news like samppipe, but I want to know how to do this

So I want to get information about how to finish the work

Sudhanshu

Solution

The boilerpipe website contains source code, quick start instructions, links to original scientific papers and corresponding conference demonstration videos:

http://code.google.com/p/boilerpipe/

This should give you a very comprehensive set of information about how it works and how to apply it in your solution

best,

Christianity

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>