jQuery in Perl – Distractions

A few months ago, I ran across a reference to a Perl module called pQuery, which is an attempt at porting jQuery to Perl. As modules go, it’s a derelict construction site. It tries to translate the jQuery Javascript library to Perl, replacing jQuery’s use of the browser DOM with HTML::Tree, which is not as easy as one could hope. It is also very out of date.

I had the bright idea of updating it, by translating the current version of jQuery to Perl. I decided to start with John Resig’s selector engine, Sizzle, which seemed like a nice stand-alone chunk with extensive tests.

I actually converted the whole file into something that could run in Perl, replacing the browser DOM with libXML this time, before realizing what a pointless exercise this was. First, because Sizzle tries to use higher-level browser APIs as much as possible, not low-level DOM calls (so, trying to figure out how it works by running it through firebug takes one down a different code path than the one used by my Perl code). Second, because at the high level, LibXML lets you use XPath expressions, which would be better than all this DOM grubbing.

John Resig had a blog post explaining why jQuery doesn’t convert its CSS selectors to XPath expressions to get maximum performance on browsers that support XPath, like other Javascript libraries do. Following his links, I found a nice-looking implementation in Prototype.js. I decided to drop my Perl version of Sizzle (pizzle?) and convert the Prototype.js stuff instead. When I got around to it.

Thinking about it again sometime later, I thought this code should live somewhere under HTML::Selectors. So I searched CPAN and found Miyagawa has got this covered (referencing this useful conversion table from Aristotle Pagaltzis. See also a trivial module that hooks up those selectors to an HTML parser, and an independant HTML::Tree-based version called HTML::Query from Andy Wardley (the Template Toolkit guy).

Oh, and there’s also Matt Trout’s HTML::Zoom, which is a templating engine using jQuery-like selectors to pick pieces of HTML to manipulate (and also has jQuery-like chaining).

Which means, someone has done the hard bit, now all that needs to be done is leveraging this into a jQuery-like tool for writing one-liners that spider and transform web pages like putty.

… and last week, Sebastian Riedel added support for DOM/CSS3 to Mojolicious and posted this one-liner on twitter:


"print Mojo::Client->new
  ->get('http://digg.com')
  ->success->dom->at('h3 a')->text;"

As sweet as that code is though, it’s a bit too fragile for the messy real-life web, because when I try to do the same trick on, say Hacker News, it fails:

print Mojo::Client->new
 ->get(q{http://news.ycombinator.com/news})
->success->dom->at(q{td.title a})->text;

Looking at the headers the website returns shows the culprit:

Content-Length: 0

Luckily, the same issue came up on the , and Sebastian offered a simple workaround to allow Mojo::Client to handle such a “broken” web server – this code works:


my $c = Mojo::Client->new;
my $tx = $c->build_tx(get => q{http://news.ycombinator.com/news});
$tx->res->content->relaxed(1);
$c->process($tx);
print $tx->res->dom->at(q{td.title a})->text;

Related Posts: