Java – jsup: extract all HTML between two blocks in CSS free HTML

2020-03-30 • Java

What is the best way to use jsup to extract all HTML (strings, documents, or elements) between two blocks that conform to this pattern:

<strong>
 {any HTML Could appear here,except for a <strong> pair}
</strong>

 ...
 {This is the HTML I need to extract. 
  any HTML Could appear here,except for a <strong> pair}
 ... 

<strong>
 {any HTML Could appear here,except for a <strong> pair}
</strong>

Using regular expressions, this may be simple if I apply it to the entire body html()：

(<strong>.+</strong>)(.+)(<strong>.+</strong>)
                       ^
                       +----- There I have my HTML content

However, as I learned from the similar challenge, if I use the DOM that has parsed jsup, the performance can be improved (even if the code is a little longer) – except that there is no element this time Nextsibling() and element Nextlementsibling() can be used for rescue

For example, I searched for nextuntil similar to jQuery in jsup, but I couldn't find anything similar

Is it possible to propose something better than the above regular expression based method?

Solution

I don't know if it's faster, but maybe something like this will work:

Elements strongs = doc.select("strong");
Element f = strongs.first();
Element l = strongs.last();
Elements siblings = f.siblingElements();
List<Element> result = siblings.subList(siblings.firstIndexOf(f) + 1,siblings.lastIndexOf(l));

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.

THE END

Java

二维码

Encapsulation of Java cookie operation encapsulation of common judgment methods

< <上一篇

Explain in detail the four launchmodes of activity in Android Development

下一篇>>

搜索内容

Java – jsup: extract all HTML between two blocks in CSS free HTML

Solution

热门文章