The Code
yadr2geo.pl is a Perl script that takes the name of your downloaded address book and outputs a geocoded RSS file to use with worldkit. Geocoded RSS refers to any flavor of RSS extended to include item-level latitude and longitude. More details on the format can be found at http://brainoff.com/worldkit/doc/rss.php#basic.
This script requires commonly installed modules: URI::Escape for formatting web service requests, LWP::Simple for making those requests, and XML::Simple for parsing the responses.
TIP
This script does not use a module to parse CSV, because modules such as Text::CSV assume that a newline indicates a new record, while in Yahoo! CSV (and most flavors of this unofficial spec) it's legal to include a newline within an entry if that entry is quoted. CSV is discussed in more detail at http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm.
Yahoo! CSV is simple: all entries are guaranteed to be quoted, the first line gives field names, and there's no extraneous whitespace. So it's straight-forward to program a script to parse Yahoo! CSV character by character. The subroutine getrecord() takes an open filehandle as an argument and returns an array containing the next CSV record.
Save the following code to a file called yadr2geo.pl:
#!/usr/bin/perl -w
use strict;
use XML::Simple qw(XMLin);
use LWP::Simple qw(get);
use URI::Escape qw(uri_escape);
# Map your personal country naming conventions
# to country codes listed at http://brainoff.com/geocoder/countryselect.php
# and change the default country if you wish
my %countrycode = ('USA' => 'US');
my $defaultcountry = 'US';
print <<RSSHEADER;
<?xml version="1.0"?>
<rss version="2.0">
<channel>
<title>Yahoo! Address Book</title>
<link>http://address.yahoo.com/</link>
<description>My geocoded Yahoo! Address Book</description>
RSSHEADER
my (%hash, @vals, $arg, $loc, $lat, $lon, $success, $country);
# First line of Yahoo! CVS is keys
my @keys = @{ getrecord(*STDIN) };
while (! eof(STDIN)) {
@vals = @{ getrecord(*STDIN) };
@hash{ @keys } = @vals;
$success = 0;
undef($loc);
$country = $countrycode{ $hash{'Home Country'} } || $defaultcountry;
# Check for sufficient information to geocode
if (length($hash{'Home City'}) == 0
|| ($country eq "US" && length($hash{'Home State'}) == 0)) {
print STDERR "Couldn't geocode: \""
. join ("\",\"", @vals) . "\"\n";
next;
}
# Try geocoding US street address
if ($country eq 'US'
&& length($hash{'Home Address'}) > 0) {
$arg = $hash{'Home Address'} . "," . $hash{'Home City'}
. "," . $hash{'Home State'};
eval {
# Be patient, geocoder.us free service is rate limited
$loc = XMLin(
get("http://geocoder.us/service/rest/?address="
. uri_escape($arg) )
);
};
if (!$@ && defined($loc->{"geo:Point"}->{"geo:long"}) &&
defined($loc->{"geo:Point"}->{"geo:lat"})) {
$success = 1;
}
}
# Try geocoding world city
if ($country ne 'US' || ! $success) {
if ($country ne "US") {
$arg = $hash{'Home City'} . "," . $country;
} else {
$arg = $hash{'Home City'} . "," . $hash{'Home State'}
. "," . $country;
}
eval {
$loc = XMLin(
get("http://brainoff.com/geocoder/rest?city="
. uri_escape($arg))
);
};
if (!$@ && defined($loc->{"geo:Point"}->{"geo:long"}) &&
defined($loc->{"geo:Point"}->{"geo:lat"})) {
$success = 1;
}
}
if ($success) {
print <<ITEM;
<item>
<title>$hash{'First'} $hash{'Last'}</title>
<geo:lat>$loc->{"geo:Point"}->{"geo:lat"}</geo:lat>
<geo:long>$loc->{"geo:Point"}->{"geo:long"}</geo:long>
</item>
ITEM
} else {
print STDERR "Couldn't geocode: \""
. join ("\",\"", @vals) . "\"\n";
}
}
print "</channel></rss>\n";
#
# "getrecord" returns the next record as an array from an open
# filehandle. It is a simple state machine, that expects a file
# formatted in 'Yahoo! CVS'
#
sub getrecord {
my $fh = shift;
my $c = "";
my $st = 0;
my @record;
my $entry = "";
while (defined($c)) {
$c = getc($fh);
if ($st == 0) {
if ($c eq "\n" || ! $c) {
return \@record;
} elsif ($c eq "\"") {
$st = 1;
} else {
die "error: parsing state:$st char:$c\n";
}
} elsif ($st == 1) {
if ($c eq "\"") {
$st = 2;
} else {
$entry .= $c;
}
} elsif ($st == 2) {
if ($c eq "\"") {
$entry .= "\"";
$st = 1;
} elsif ($c eq ",") {
push @record, $entry;
$entry = "";
$st = 0;
} elsif ($c eq "\n") {
push @record, $entry;
return \@record;
} else {
die "error: parsing state:$st char:$c\n";
}
}
}
die "error: premature end of file\n";
}
The main body of the script builds a hash from the current record, attempts to geocode the address, and outputs an RSS item if it's successful. For U.S. locations with full street address, the REST service from http://geocoder.us is employed. It expects an address, city name, and state abbreviation, and it returns a small bit of XML containing a latitude/longitude pair if it's successful. The free service is rate limited, so you'll notice pauses during requests. For non-U.S. locations—and for unsuccessful Geocoder requests—a request is made to the REST interface of the Geocoder at http://brainoff.com/geocoder, which expects a city, state abbreviation for U.S. cities, and country code.
The country codes are particular to the GNS (http://earth-info.nga.mil/gns/html) database that backs this service. To look up the codes, go to http://brainoff.com/geocoder/countryselect.php and select a country; a JavaScript alert will give you the code. You will need to map the country names used in your address book to these codes, by adding entries to %countrycode in the script.
If you use a non-English language on Yahoo!, you might have different field names from the ones expected. The script uses Home Address, Home City, Home State, and Home Country. You might need to examine your CSV export and replace these field names in the code. Similarly, if you wanted to map work addresses, you'd replace Home with Work in each of these field names. Another modification to try is adding a <description> or <link> field to each item, set, for example, to the Personal Website field.