This article compares the two libraries in terms of features, performance, ease of use and documentation.
To start using either library, you just need to include their main php file:
Once that is done, you can create a new document. phpQuery provides a handful of functions to create documents either from markup such as
phpQuery::newDocumentHTML($html, $charset = 'utf-8') or from file such as
phpQuery::newDocumentFileXML($file, $charset = 'utf-8') and then returns a phpQuery object. The easiest way is to let phpQuery do auto-detection using the following code:
QueryPath works in a similar way but makes things even easier by just providing a single document creation function that accepts markup, filepaths and even URLs. It then returns a QueryPath object.
Both libraries follow the same principle of chain-ability that jQuery is famous for. Every function returns an object (like
$qp from above) which can be re-used to apply additional functions such as
In addition, both libraries allow you to loop through results by using PHP's
foreach. The difference is that phpQuery returns a
DomNode object whereas QueryPath returns a QueryPath object.
There's a little catch with QueryPath in that the functions are always applied to the current context whereas phpQuery always starts from the object's root. In order for QueryPath to start from the root, a special selector
:root has to be applied. See the following examples:
However, unless you absolutely require one of those advanced functions, you should be fine with either library.
Certain functions such as
find() accept selectors similar to CSS selectors. Both libraries seem to support the same basic selectors such as
selector1, selector2, selector3 and hierarchical selectors such as
ancestor descendant or
parent > child.
Again no difference in which library you use.
In addition to jQuery support, both libraries offer a couple of special features which you may or may not find useful. These are some of the more important ones:
Both libraries offer online documentation on how to use them.
phpQuery's documentation is very user-friendly and describes all features where necessary or links to the respective jQuery page. Furthermore, the library files contain a lot of useful example code to get your hands dirty. Not once did I have to look for other tutorials on the net, everything I needed was right there.
QueryPath's documentation consists of an introductory tutorial, example source files and an API reference. The former two do a decent job at explaining the basics, however, the API reference is confusing and provides no easy way of finding what jQuery features are supported. I found myself often forced to look for additional tutorials elsewhere that showed some more example code in action.
If you plan on using either library to parse HTML sites you didn't create yourself, chances are they contain invalid markup and don't fully validate.
This can lead to problems when feeding the erroneous HTML to either parser, worst case being a library crash, which I encountered once using phpQuery.
Generally speaking, though, phpQuery handles invalid markup quite well most of the time and isn't too picky about its correctness.
QueryPath, on the other hand, throws out a fatal error whenever it encounters even the slightest anomaly, such as an unescaped ampersand (
& instead of
If you do indeed have to work with input data from external sources, I recommend running the markup through a HTML sanitizer first, such as HTML Tidy for PHP (also available for other languages). The following code snippet demonstrates this:
Now comes the real interesting part: Which library performs better?
To measure their performance, several small test queries were written and their execution time evaluated. The resulting score is the average duration of 10 test runs.
Because parsing a document and modifying it are vastly different things, the tests were divided into read and write operations.
To test document parsing, the W3C start page was used as input and each of the following code snippets were executed 1'000 times in a loop (the first line for phpQuery and the second for QueryPath):
Except for test 3, one can clearly see that QueryPath is heavily outperformed by phpQuery, especially for parsing nested tags in test 2. But even for simple operations, such as finding a specific tag in test 1, QueryPath is more than 3 times slower than phpQuery.
Needless to say, I was quite surprised by this result and first thought there was a bug in the testing code. But after thoroughly analyzing it and re-running the tests the results remained the same.
The write tests involved creating a new document in test 1 and then modifying it in the subsequent tests:
As if the read tests weren't surprising enough, this one takes the cake! Except for test 1, where a new document is filled with empty tags, phpQuery gets totally crushed by QueryPath. phpQuery is about 35 times slower when it comes to writing operations.
So, which library should you choose?
Performance aside, both libraries cover the most important jQuery features and are almost identical in usage. You can't really go wrong with either one.
The fact that QueryPath must be told explicitly to start from the document root makes it slightly less intuitive to use, though. This and phpQuery currently supporting some of the more advanced jQuery functions coupled with a more user-friendly documentation might make it the better choice for some quick jQuery-esque document processing.
However, when performance becomes an important factor, one must distinguish between read- and write-heavy tasks. If you need to parse hundreds of documents, the benchmark results speak for phpQuery. For document creation and modification, however, QueryPath is clearly the superior library.