Java – how to programmatically check HTML documents

I have a database containing small HTML documents, and I need to programmatically insert several into PDF documents with iText or Aspose Words in a word document I need to preserve any formatting in the HTML document (within a reasonable range, it is necessary to respect < b > tags, and CSS like < span style = "blah" > is a good choice)

Both iText and aspese work (roughly):

Document document = new Document( Size.A4,Aspect.PORTRAIT );

document.setFont( "Helvetica",20,Font.BOLD );
document.insert( "some string" )
document.setBold( true );
document.insert( "A bold string" );

So (I think) I need some kind of HTML parser that I can check strings and styles to insert into my document

Can anyone suggest a good library or a wise way to solve this problem? The platform is Java

Solution

HTML parser is a good HTML parser

I use it to parse HTML on one of my projects

You can write your own filter to parse the HTML you want, so & lt; Br > labels should not be difficult to parse

You can parse CSS in cssselectornodefilter

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>