Java – jsup clean method

I'm trying to use this code to completely clear my text from HTML elements:

Jsoup.clean(preparedText,Whitelist.none())

Unfortunately, it did not delete & nbsp; Element I think it will replace it with spaces, just as it replaces & middot; With middle point ("·")

Should I use other methods to implement this function?

Solution

From jsup docs:

Therefore, the whitelist only focuses on tags and attributes& Ampere; NBSP; It is neither a tag nor an attribute It's just HTML encoding of a special character If you want to convert from encoding to plain text, you can use, for example, the excellent Apache commons Lang library or the jsup unescapeendities method:

System.out.println(Parser.unescapeEntities(doc.toString(),false));

Appendix:

Translation from & middot; A "·" has occurred while parsing HTML It seems to have nothing to do with cleaning methods

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>