Java – parsing tables with jsoup
•
Java
I'm trying to use jsoup to extract email addresses and phone numbers from the LinkedIn configuration file, each in a table I wrote a code to extract them, but it doesn't work. The code should apply to any LinkedIn configuration file Any help or guidance will be appreciated
public static void main(String[] args) { try { String url = "https://fr.linkedin.com/"; // fetch the document over HTTP Document doc = Jsoup.connect(url).get(); // get the page title String title = doc.title(); System.out.println("Nom & Prénom: " + title); // first method Elements table = doc.select("div[class=more-info defer-load]").select("table"); Iterator < Element > iterator = table.select("ul li a").iterator(); while (iterator.hasNext()) { System.out.println(iterator.next().text()); } // second method for (Element tablee: doc.select("div[class=more-info defer-load]").select("table")) { for (Element row: tablee.select("tr")) { Elements tds = row.select("td"); if (tds.size() > 0) { System.out.println(tds.get(0).text() + ":" + tds.get(1).text()); } } } } }
This is an example of the HTML code I'm trying to extract (from the LinkedIn configuration file)
<table summary="Coordonnées en ligne"> <tr> <th>E-mail</th> <td> <div id="email"> <div id="email-view"> <ul> <li> <a href="mailto:adam1adam@gmail.com">adam1adam@gmail.com</a> </li> </ul> </div> </div> </td> </tr> <tr class="no-contact-info-data"> <th>Messagerie instantanée</th> <td> <div id="im" class="editable-item"> </div> </td> </tr> <tr class="address-book"> <th>Carnet d’adresses</th> <td> <span class="address-book"> <a title="Une nouvelle fenêtre s’ouvrira" class="address-book-edit" href="/editContact?editContact=&contactMemberID=368674763">Ajouter</a> des coordonnées. </span> </td> </tr> </table> <table summary="Coordonnées"> <tr> <th>Téléphone</th> <td> <div id="phone" class="editable-item"> <div id="phone-view"> <ul> <li>0021653191431 (Mobile)</li> </ul> </div> </div> </td> </tr> <tr class="no-contact-info-data"> <th>Adresse</th> <td> <div id="address" class="editable-item"> <div id="address-view"> <ul> </ul> </div> </div> </td> </tr> </table>
Solution
To grab email and phone numbers, use the CSS selector to locate the element identifier
String email = doc.select("div#email-view > ul > li > a").attr("href"); System.out.println(email); String phone = doc.select("div#phone-view > ul > li").text(); System.out.println(phone);
For more information, see CSS selectors
yield
mailto:adam1adam@gmail.com 0021653191431 (Mobile)
The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
二维码