How to use jsup to extract paragraph text from HTML?
•
Java
import java.io.IOException;
import java.io.IOException; import java.util.logging.Level; import java.util.logging.Logger; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; public class JavaApplication14 { public static void main(String[] args) { try { Document doc = Jsoup.connect("tanmoy_mahathir.makes.org/thimble/146").get(); String html= "<html><head></head>" + "<body><p>Parsed HTML into a doc." + "</p></body></html>"; Elements paragraphs = doc.select("p"); for(Element p : paragraphs) System.out.println(p.text()); } catch (IOException ex) { Logger.getLogger(JavaApplication14.class.getName()).log(Level.SEVERE,null,ex); } }
}
Anyone can help me figure out how the jsoup code parses the part that includes the paragraph so that it only prints
Hello,World! Nothing is impossible
Solution
For this small part of HTML, you just need to do
String html= "<html><head></head>" + "<body><p>Parsed HTML into a doc."+ +"</p></body></html>"; Document doc = Jsoup.parse(html); Elements paragraphs = doc.select("p"); for(Element p : paragraphs) System.out.println(p.text());
As I can see, your link contains almost the same HTML. You can also replace the definition of doc with doc
Document doc = Jsoup.connect("https://tanmoy_mahathir.makes.org/thimble/146").get();
UPDATE
This is the complete code compiled and running normally
import java.io.IOException; import java.util.logging.*; import org.jsoup.*; import org.jsoup.nodes.*; import org.jsoup.select.*; public class JavaApplication14 { public static void main(String[] args) { try { String url = "https://tanmoy_mahathir.makes.org/thimble/146"; Document doc = Jsoup.connect(url).get(); Elements paragraphs = doc.select("p"); for(Element p : paragraphs) System.out.println(p.text()); } catch (IOException ex) { Logger.getLogger(JavaApplication14.class.getName()) .log(Level.SEVERE,ex); } } }
The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
二维码