The Perl Journal has a good tutorial to Parsing HTML with HTML::PARSER, by Ken MacFarlane.
Worth noting that this is already sub-classed (and re-used) by lots of other modules in CPAN. Although I think the common approach is either to (a) use RegExps to parse what you need or (b) use some black-box tool to convert the HTML to well-formed XML, and parse that.
The HTML::Tree objects (tree builder and tree node classes) are another interface to the HTML::Parser, as is HTML::TokeParser.
I think I’ll stick with hacking on the plain parser for now.