Java – jsoup: Retrieves elements that do not contain specific attributes
I have a table with follow logic
>Table display name list > for each row containing < tr class = hiderow > < TD class = packagename >... < / td > < / TR > – > this row will not be visible
Therefore, the table may contain 100 rows, but if 20 rows contain class = hiderow, the user can only see 80 rows on the page I want to retrieve the names of those 80 lines (not 100) So I need to parse the data that does not contain class = hiderow I know how to use jsoup to get each name. I also see that there are elements in the document that do not match the selector But I don't know how to use it Please help.
Editor: I've figured out how to do this If there is a better way, please let me know Edit2 please use balusc's following solutions It's cleaner
public void obtainPackageName(String urlLink) throws IOException{ List<String> pdfList = new ArrayList<String>(); URL url = new URL(urlLink); Document doc = Jsoup.parse(url,3000); Element table = doc.select("table[id=mastertableid]").first(); Iterator<Element> rowIter = table.select("tr").iterator(); while(rowIter.hasNext()){ Element row = rowIter.next(); if(!row.className().contains("hiderow")){ Element packageName = row.select("td[class=packagename]").first(); if(packageName != null){ pdfList.add(packageName.text()); } } } }
Solution
You need to apply: not() to the element of interest (TR in your case), and then pass the CSS selector relative to the element to the element that should not match (hiderow in your case)
Therefore, this should be done:
Document document = Jsoup.connect(urlLink).get(); Elements packagenames = document.select("#mastertableid tr:not(.hiderow) td.packagename"); List<String> pdfList = new ArrayList<String>(); for (Element packagename : packagenames) { pdfList.add(packagename.text()); }