Java – jsup: extract all HTML between two blocks in CSS free HTML

What is the best way to use jsup to extract all HTML (strings, documents, or elements) between two blocks that conform to this pattern:

<strong>
 {any HTML Could appear here,except for a <strong> pair}
</strong>

 ...
 {This is the HTML I need to extract. 
  any HTML Could appear here,except for a <strong> pair}
 ... 

<strong>
 {any HTML Could appear here,except for a <strong> pair}
</strong>

Using regular expressions, this may be simple if I apply it to the entire body html():

(<strong>.+</strong>)(.+)(<strong>.+</strong>)
                       ^
                       +----- There I have my HTML content

However, as I learned from the similar challenge, if I use the DOM that has parsed jsup, the performance can be improved (even if the code is a little longer) – except that there is no element this time Nextsibling() and element Nextlementsibling() can be used for rescue

For example, I searched for nextuntil similar to jQuery in jsup, but I couldn't find anything similar

Is it possible to propose something better than the above regular expression based method?

Solution

I don't know if it's faster, but maybe something like this will work:

Elements strongs = doc.select("strong");
Element f = strongs.first();
Element l = strongs.last();
Elements siblings = f.siblingElements();
List<Element> result = siblings.subList(siblings.firstIndexOf(f) + 1,siblings.lastIndexOf(l));
The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>