Java – how to convert HTML to text and keep line breaks
•
Java
How to convert HTML to text and keep line breaks (generated by elements such as BR, P, DIV) may use nekohtml or any good enough HTML parser
Example: Hello & lt peak; Br / > world to:
Hello\n World
Solution
This is my function to output text (including line breaks) by using the jsup iteration node
public static String htmlToText(InputStream html) throws IOException { Document document = Jsoup.parse(html,null,""); Element body = document.body(); return buildStringFromNode(body).toString(); } private static StringBuffer buildStringFromNode(Node node) { StringBuffer buffer = new StringBuffer(); if (node instanceof TextNode) { TextNode textNode = (TextNode) node; buffer.append(textNode.text().trim()); } for (Node childNode : node.childNodes()) { buffer.append(buildStringFromNode(childNode)); } if (node instanceof Element) { Element element = (Element) node; String tagName = element.tagName(); if ("p".equals(tagName) || "br".equals(tagName)) { buffer.append("\n"); } } return buffer; }
The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
二维码