Dear all,
I'm trying to extract data from HTML using XPath in Java.
Unfortunately the text contents of nodes may contain <br/tags which
are not correctly interpreted, at least not for me ;)
A <pnode may contain this text:
<p>
Test1<br/>
Test2<br/>
Test3
</p>
Which is returned by the XPath query as "Test1Test2Test 3" but I need
it as "Test1\nTest2\n Test3" or "Test1 Test2 Test3".
Here's example code (Java 6):
public class Example {
private static final String html = "<html><body><p >Test1<br/
public static void main( String[] args ) throws Exception {
final XPathFactory xPathFactory = XPathFactory.ne wInstance();
XPath xPath = xPathFactory.ne wXPath();
String value = (String)xPath.e valuate(
"//p",
new InputSource( new StringReader( html ) ),
XPathConstants. STRING );
System.out.prin tln( value );
xPath = xPathFactory.ne wXPath();
value = (String)xPath.e valuate(
"//p/text()",
new InputSource( new StringReader( html ) ),
XPathConstants. STRING );
System.out.prin tln( value );
xPath = xPathFactory.ne wXPath();
value = (String)xPath.e valuate(
"//p/node()",
new InputSource( new StringReader( html ) ),
XPathConstants. STRING );
System.out.prin tln( value );
}
}
This code returns:
Test1Test2Test3
Test1
Test1
Is there any way (XPath function etc) which will return the contents
as desired?
Thank you!
I'm trying to extract data from HTML using XPath in Java.
Unfortunately the text contents of nodes may contain <br/tags which
are not correctly interpreted, at least not for me ;)
A <pnode may contain this text:
<p>
Test1<br/>
Test2<br/>
Test3
</p>
Which is returned by the XPath query as "Test1Test2Test 3" but I need
it as "Test1\nTest2\n Test3" or "Test1 Test2 Test3".
Here's example code (Java 6):
public class Example {
private static final String html = "<html><body><p >Test1<br/
>Test2<br/>Test3</p></body></html>";
final XPathFactory xPathFactory = XPathFactory.ne wInstance();
XPath xPath = xPathFactory.ne wXPath();
String value = (String)xPath.e valuate(
"//p",
new InputSource( new StringReader( html ) ),
XPathConstants. STRING );
System.out.prin tln( value );
xPath = xPathFactory.ne wXPath();
value = (String)xPath.e valuate(
"//p/text()",
new InputSource( new StringReader( html ) ),
XPathConstants. STRING );
System.out.prin tln( value );
xPath = xPathFactory.ne wXPath();
value = (String)xPath.e valuate(
"//p/node()",
new InputSource( new StringReader( html ) ),
XPathConstants. STRING );
System.out.prin tln( value );
}
}
This code returns:
Test1Test2Test3
Test1
Test1
Is there any way (XPath function etc) which will return the contents
as desired?
Thank you!
Comment