XPath select()

Finding a node is useful but often you will want to select all nodes matching a pattern. For example, a web spider will want to select all links in a HTML page.

The select() call returns an Iterator letting you look at all the nodes. The Iterator returns nodes in document order.

You can precompile XPath patterns and then reuse the precompiled pattern over again. The XPath.find() call in the previous page used a convenient, but inefficient call to find the node. In this example, we'll precompile the pattern before using the select.

The following example reads the home page of http://localhost:8080 and and returns all the <a href> links in that page.

Spidering a Web Page

Pattern pattern = XPath.parseMatch("a[@href]");

Document doc = new Html().parseDocument("http://localhost:8080");

Iterator iter = pattern.select(doc);

while (iter.hasNext()) {
  Element elt = (Element) iter.next();

  System.out.println("link: " + elt.getAttribute("href"));
}

link: /index.xtp link: /ref/faq.xtp link: /ref/index.xtp link: /javadoc/index.html ...

Summary

Precompiling XPath patterns is more efficient
XPath.select() can "spider" web pages
select returns nodes in document order
The pattern a[@href] returns <a> elements with an href attribute.