Zend\Dom\Query provides mechanisms for querying XML and (X) HTML documents utilizing either XPath or CSS selectors. It was developed to aid with functional testing of MVC applications, but could also be used for rapid development of screen scrapers.
CSS selector notation is provided as a simpler and more familiar notation for web developers to utilize when querying documents with XML structures. The notation should be familiar to anybody who has developed Cascading Style Sheets or who utilizes Javascript toolkits that provide functionality for selecting nodes utilizing CSS selectors (Prototype’s $$() and Dojo’s dojo.query were both inspirations for the component).
To use Zend\Dom\Query, you instantiate a Zend\Dom\Query object, optionally passing a document to query (a string). Once you have a document, you can use either the query() or queryXpath() methods; each method will return a Zend\Dom\NodeList object with any matching nodes.
The primary difference between Zend\Dom\Query and using DOMDocument + DOMXPath is the ability to select against CSS selectors. You can utilize any of the following, in any combination:
element types: provide an element type to match: ‘div’, ‘a’, ‘span’, ‘h2’, etc.
style attributes: CSS style attributes to match: ‘.error‘, ‘div.error‘, ‘label.required‘, etc. If an element defines more than one style, this will match as long as the named style is present anywhere in the style declaration.
id attributes: element ID attributes to match: ‘#content’, ‘div#nav’, etc.
arbitrary attributes: arbitrary element attributes to match. Three different types of matching are provided:
direct descendents: utilize ‘>’ between selectors to denote direct descendents. ‘div > span’ would select only ‘span’ elements that are direct descendents of a ‘div’. Can also be used with any of the selectors above.
descendents: string together multiple selectors to indicate a hierarchy along which to search. ‘div .foo span #one‘ would select an element of id ‘one’ that is a descendent of arbitrary depth beneath a ‘span’ element, which is in turn a descendent of arbitrary depth beneath an element with a class of ‘foo’, that is an descendent of arbitrary depth beneath a ‘div’ element. For example, it would match the link to the word ‘One’ in the listing below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | <div>
<table>
<tr>
<td class="foo">
<div>
Lorem ipsum <span class="bar">
<a href="/foo/bar" id="one">One</a>
<a href="/foo/baz" id="two">Two</a>
<a href="/foo/bat" id="three">Three</a>
<a href="/foo/bla" id="four">Four</a>
</span>
</div>
</td>
</tr>
</table>
</div>
|
Once you’ve performed your query, you can then work with the result object to determine information about the nodes, as well as to pull them and/or their content directly for examination and manipulation. Zend\Dom\NodeList implements Countable and Iterator, and stores the results internally as a DOMDocument and DOMNodeList. As an example, consider the following call, that selects against the HTML above:
1 2 3 4 5 6 7 8 9 | use Zend\Dom\Query;
$dom = new Query($html);
$results = $dom->execute('.foo .bar a');
$count = count($results); // get number of matches: 4
foreach ($results as $result) {
// $result is a DOMElement
}
|
Zend\Dom\Query also allows straight XPath queries utilizing the queryXpath() method; you can pass any valid XPath query to this method, and it will return a Zend\Dom\NodeList object.
The Zend\Dom\Query family of classes have the following methods available.
The following methods are available to Zend\Dom\Query:
As mentioned previously, Zend\Dom\NodeList implements both Iterator and Countable, and as such can be used in a foreach() loop as well as with the count() function. Additionally, it exposes the following methods:
The source code of this file is hosted on GitHub. Everyone can update and fix errors in this document with few clicks - no downloads needed.