Java – jsup: extract all HTML between two blocks in CSS free HTML
What is the best way to use jsup to extract all HTML (strings, documents, or elements) between two blocks that conform to this pattern:
<strong> {any HTML Could appear here,except for a <strong> pair} </strong> ... {This is the HTML I need to extract. any HTML Could appear here,except for a <strong> pair} ... <strong> {any HTML Could appear here,except for a <strong> pair} </strong>
Using regular expressions, this may be simple if I apply it to the entire body html():
(<strong>.+</strong>)(.+)(<strong>.+</strong>) ^ +----- There I have my HTML content
However, as I learned from the similar challenge, if I use the DOM that has parsed jsup, the performance can be improved (even if the code is a little longer) – except that there is no element this time Nextsibling() and element Nextlementsibling() can be used for rescue
For example, I searched for nextuntil similar to jQuery in jsup, but I couldn't find anything similar
Is it possible to propose something better than the above regular expression based method?
Solution
I don't know if it's faster, but maybe something like this will work:
Elements strongs = doc.select("strong"); Element f = strongs.first(); Element l = strongs.last(); Elements siblings = f.siblingElements(); List<Element> result = siblings.subList(siblings.firstIndexOf(f) + 1,siblings.lastIndexOf(l));