How to use jsup to extract paragraph text from HTML?

2019-12-19 • Java

import java.io.IOException;

import java.io.IOException;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class JavaApplication14 {


public static void main(String[] args)  {
    try {
        Document doc = Jsoup.connect("tanmoy_mahathir.makes.org/thimble/146").get();  
         String html= "<html><head></head>" + "<body><p>Parsed HTML into a doc."
                 + "</p></body></html>"; 
  Elements paragraphs = doc.select("p");
  for(Element p : paragraphs)
    System.out.println(p.text());
    } catch (IOException ex) {
        Logger.getLogger(JavaApplication14.class.getName()).log(Level.SEVERE,null,ex);
    }
}

}

Anyone can help me figure out how the jsoup code parses the part that includes the paragraph so that it only prints

Hello,World!
Nothing is impossible

Solution

For this small part of HTML, you just need to do

String html= "<html><head></head>" + "<body><p>Parsed HTML into a doc."+
                    +"</p></body></html>"; 
Document doc = Jsoup.parse(html); 
Elements paragraphs = doc.select("p");
for(Element p : paragraphs)
  System.out.println(p.text());

As I can see, your link contains almost the same HTML. You can also replace the definition of doc with doc

Document doc = Jsoup.connect("https://tanmoy_mahathir.makes.org/thimble/146").get();

UPDATE

This is the complete code compiled and running normally

import java.io.IOException;
import java.util.logging.*;
import org.jsoup.*;
import org.jsoup.nodes.*;
import org.jsoup.select.*;

public class JavaApplication14 {

  public static void main(String[] args)  {
    try {
      String url = "https://tanmoy_mahathir.makes.org/thimble/146";
      Document doc = Jsoup.connect(url).get();
      Elements paragraphs = doc.select("p");
      for(Element p : paragraphs)
        System.out.println(p.text());
    } 
    catch (IOException ex) {
      Logger.getLogger(JavaApplication14.class.getName())
            .log(Level.SEVERE,ex);
    }
  }
}

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.

THE END

Java

二维码

How to use SVM in Weka?

< <上一篇

Java – implement clone () for immutable classes

下一篇>>

搜索内容

How to use jsup to extract paragraph text from HTML?

Solution

热门文章