Categories
Software and Programming

The Busy XML hack

I wanted to list the news feeds I subscribe to in Syndirella to my reading page (still under construction). The program can export this list to me as an OPML file.

OPML is a simple (if inconsistent) XML format, so I tried to use a Perl module called XML::Simple to parse it. For some reason it didn’t work, so I messed around with it and then decided to simply write a script that parses the OPML file with regular expressions:


#an opml subscriptions parser.
while(<>) {
m/title\="([^"]+)"/i and my $title = $1;
m/htmlurl\="([^"]+)"/i and my $html = $1;
m/xmlurl\="([^"]+)"/i and my $rss = $1;
if ($html ne "" && $rss ne "" && $title ne "") {
	print qq{ <p><a href="$html">$title</a> 
                      [ <a href="$rss">RSS</a> ] </p>\n };
	}
}

Then I figured out what the problem with the original script was (I was handing it the script itself as input, instead of the OPML file. Doh!). So I wrote the proper XML-parser using program:


use strict;
use warnings;
use XML::Simple;

my $subfile = shift;
my $subs = XMLin($subfile);
for my $item (@{$subs->{'body'}->{'outline'}})
{
	my $html = $item->{'htmlUrl'}; 
	my $rss = $item->{'xmlUrl'}; 
	my $title = $item->{'title'}; 
	print qq{ <p><a href="$html">$title</a> 
                     [ <a href="$rss">RSS</a> ] </p>\n };
}


Not only is the regular expression based program simpler, it also works with OPML files that use different case for the attributes (htmlurl instead of htmlUrl – I told you it was inconsistent).

When I say “simpler”, above, I mean to write. The second program certainly looks clearer, especially if you’re used to the syntax of more rigorous languages. The quick regular expression hack looks a bit messier than usual because of the escapes in the regular expression (the backslashes before the quotes) and because it uses Perl’s backwards condition syntax (putting the if after the statement saves putting the statement in a { … } block).

Added spaces before the square brackets to stop them from breaking my RSS feed.