Java – XPath normalize space() returns a series of normalized strings
I need to use the XPath function normalized space () to normalize the text I want to extract from the X HTML document: http://test.anahnarciso.com/clean_bigbook_0.html
I am using the following expression:
//*[@slot="address"]/normalize-space(.)
This is very effective in qizx studio, which I use to test XPath expressions
let $doc := doc('http://test.anahnarciso.com/clean_bigbook_0.html') return $doc//*[@slot="address"]/normalize-space(.)
This simple query returns a series of XS: string
144 Hempstead Tpke 403 West St 880 Old Country Rd 8412 164th St 8412 164th St 1 Irving Pl 1622 McDonald Ave 255 Conklin Ave 22011 Hempstead Ave 7909 Queens Blvd 11820 Queens Blvd 1027 Atlantic Ave 1068 Utica Ave 1002 Clintonville St 1002 Clintonville St 1156 Hempstead Tpke Route 49 10007 Rockaway Blvd 12694 Willets Point Blvd 343 James St
Now, I want to use the previous expression in my java code
String exp = "//*[@slot=\"address"\"]/normalize-space(.)"; XPath xpath = XPathFactory.newInstance().newXPath(); XPathExpression expr = xpath.compile(exp); Object result = expr.evaluate(doc,XPathConstants.NODESET);
But the last line throws an exception:
Cannot convert XPath value to Java object: the required class is org. XML w3c. dom. NodeList; The supplied value has the type XS: string
Obviously, I should change xpathconstants NODESET; I tried xpathconstants String, but it only returns the first element of the sequence
How can I get something like strings array?
Thank you in advance
Solution
Your expression is valid in XPath 2.0, but illegal in XPath 1.0 (used in Java) – it should be normalize space (/ / * [@ slot = 'address')
In any case, in XPath 1.0, when normalize space () is called on a node set, only the first node (in document order) is taken
In order to do what you want to do, you need to use an XPath 2.0 compatible parser, or traverse the result node set and call normalize space() on each node:
XPath xpath = XPathFactory.newInstance().newXPath(); XPathExpression expr; String select = "//*[@slot='address']"; expr = xpath.compile(select); NodeList result = (NodeList)expr.evaluate(input,XPathConstants.NODESET); String normalize = "normalize-space(.)"; expr = xpath.compile(normalize); int length = result.getLength(); for (int i = 0; i < length; i++) { System.out.println(expr.evaluate(result.item(i),XPathConstants.STRING)); }
... output completely given output