O'Reilly Network    
 Published on O'Reilly Network (http://www.oreillynet.com/)
 See this if you're having trouble printing code examples

Yahoo! Web Services

by Paul Bausch, author of Amazon Hacks

I've been using Yahoo! for as long I've been using the web, somewhere around 10 years. Over the years, Yahoo! has become a part of my daily life. If I want to find a local business, I type yp.yahoo.com before I reach for the yellow pages in my kitchen, and then I type maps.yahoo.com to find my way to that business. I browse news, check stock prices, and get movie times with Yahoo! Even though I interact with Yahoo! technology on a regular basis, I've never thought of Yahoo! as a technology company.

Yahoo Hacks
Coming in Summer 2005

Now that Yahoo! has released a Web Services interface, my perception of them is changing. Suddenly having programmatic access to a good portion of their data has me seeing Yahoo! through the eyes of a developer rather than a user. Reading through the documentation on their new site for Yahoo! developers, I've suddenly realized that there is some impressive technology behind the services I took for granted.

Yahoo! isn't the first big web company to give outside developers access to their data via web services. Google and Amazon kicked things off in 2002 with free web APIs. These two became the canonical examples I heard at conferences, and I've seen quite a few demo applications that pull in current book prices or Google search results. eBay released a for-pay API in 2003, giving paying developers access to their auction database. And now Yahoo!'s impressive entry into the web services fray is another sign that big web companies have embraced outside developers in a big way.

A Quick Tour

Yahoo!'s Web Services use the familiar REST architecture, where specially constructed URLs return an XML response in a unique format. In addition to web search results, Yahoo!'s API includes the ability to fetch results for images, local information, news, and video.

Across their API, the XML format includes a top-level <ResultSet> node that holds a number of <Result> nodes. The data in each <Result> varies by the service, but it breaks down about how you'd expect it to. For example, local searches contain everything within a result that you'd need to build your own yellow pages: Business Title, Address, City, State, Phone, Distance (from query location), and various URLs. Similarly, the news searches have everything you'd need to build your own newspaper: Article Title, Summary, News Source, Publication Date, and Thumbnail. Not only is the data what you'd expect, but it's also tagged in a very intuitive way. The Business Title is held in a <Title> tag, along with <City> and <State> tags, which makes the response very readable. Figure 1 is a look at an XML response for a Yahoo! Local search for coffee in Corvallis, Oregon.

Thumbnail, click for full-size image.
Figure 1: Yahoo! Local XML--click for full-size image.

Yahoo!'s terms of use state that the API is for "noncommercial use or benefit," so I won't be building Paul's Yellow Pages with Yahoo! data and charging for use. But for noncommercial use, the only thing they ask in exchange for the data is a note somewhere in the application that it's "Powered by Yahoo!"

If the API is free, what's in it for Yahoo!? I asked Yahoo! Search Evangelist Jeremy Zawadony exactly this, and he said, "By exposing interesting pieces of Yahoo! to the larger developer community, we think they'll build applications that benefit both us and our users." Developers have already been gathering data from Yahoo!'s HTML via screen scraping. Jeremy acknowledged that data gathering "has been happening for years. But officially supporting it should really accelerate and legitimize the process." It's true; a quick search on CPAN for Yahoo! will yield hundreds of existing Perl modules that scrape every conceivable corner of Yahoo!'s data. By releasing Web Services, Yahoo! can actively promote and encourage the creative development of applications outside their walls.

A Sample Application

As I was looking through the documentation for the news search, I noticed that Yahoo! provides thumbnail images for some news stories. People have been syndicating news across sites for years with RSS, but I haven't seen many using news-related images. So I thought I'd throw together a quick example of how you might use Yahoo! news images on a remote site.

If you want to use this example, you'll need a couple of nonstandard Perl modules: LWP::Simple for fetching the Yahoo! API response and XML::Simple for parsing it. You'll also need a free application ID from Yahoo! that you can pick up at developer.yahoo.net. (It's a nice touch that you can create your own app ID instead of having an unmemorable string of random characters assigned for you.)

This script accepts a search term, queries the Yahoo! News API for stories related to that term, and shows any matches with thumbnails in HTML. If you're following along at home, create a file called yahoo_newsbar.pl and add the following code. Be sure to add your own Yahoo! application ID into the code, and you'll be set.

# yahoo_newsbar.pl
# Accepts a search term and shows thumbnails of related news items.
# Usage: yahoo_newsbar.pl?<Query>
# You can create an AppID, and read the full documentation
# for Yahoo! Web Services at http://developer.yahoo.net/

use strict;
use LWP::Simple;
use XML::Simple;

# Set your unique Yahoo! Application ID
my $appID = "[insert your app ID]";

# Set the number of story thumbnails for the newsbar
my $total_stories = 2;

# Grab the incoming search query
my $query = join(' ', @ARGV) or die "Usage: yahoo_newsbar.pl?<query>\n";

# Construct a Yahoo! News Query
my $base_url = "http://api.search.yahoo.com/NewsSearchService/V1/newsSearch";
my $type = "all"; #alternates: any, phrase.
my $results = 50;
my $language = "en"; 
my $results_sort = "rank"; #alternate: date.
my $req_url = "$base_url?appid=$appID&query=$query&results=$results";
$req_url .= "&language=$language&sort=$results_sort";

# Make the request
my $yahoo_response = get($req_url);

# Parse the XML
my $xmlsimple = XML::Simple->new();
my $yahoo_xml = $xmlsimple->XMLin($yahoo_response);

# Set some variables
my $out;
my $thumb_count = 0;

# Add the column header
$out .= "<div id=\"stories\">\n";
$out .= "<h1>The latest news about \"$query\"</h1>\n";

# Loop through the items returned
foreach my $result (@{$yahoo_xml->{Result}}) {
	# Make sure result has a thumbnail
	if (ref($result->{Thumbnail}) eq "HASH") {

		#If thumb exists, count the story, and print it out!
		my $thumb_url = $result->{Thumbnail}->{Url};
		my $thumb_height = $result->{Thumbnail}->{Height};
		my $thumb_width = $result->{Thumbnail}->{Width};
		my $title = $result->{Title};		
		my $click_url = $result->{ClickUrl};
		$click_url =~ s/&/&amp;/g;
		my $news_source = $result->{NewsSource};
		$out .= "<div class=\"story\">";
		$out .= "<a href=\"$click_url\">";
		$out .= "<img src=\"$thumb_url\" width=\"$thumb_width\"";
		$out .= " height=\"thumb_height\" border=\"0\" alt=\"\" />";
		$out .= "</a>";
		$out .= "<div class=\"headline\">";
		$out .= "<a href=\"$click_url\">$title</a>";
		$out .= "</div>";
		$out .= "<div class=\"byline\">$news_source</div>";
		$out .= "</div>\n";
	# Stop at set number of thumbnails
	last if ($thumb_count eq $total_stories);

# Add the column footer with props to Yahoo! News
if ($thumb_count eq 0) {
	# If no stories with thumbnails found, write a sad message
	$out .= "<div class\"story\">";
	$out .= "<div class=\"headline\">No stories found today.</div>";
	$out .= "</div>";
} else {
	# Otherwise, thanks Yahoo!
	$out .= "<div class=\"credit\">Powered by Yahoo! News</div>\n";
	$out .= "</div>";

# Write the results to a web page
print "Content-Type: text/html\n\n";
print "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\"";
print " \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\n";
print "<html>\n";
print "<head>\n";
print "\t<title>Yahoo News Headlines</title>\n";
print "\t<meta http-equiv=\"Content-Type\" content=\"text/html;";
print " charset=utf-8\" />\n";
print "\t<link rel=\"stylesheet\" type=\"text/css\"";
print " href=\"yahoo_newsbar.css\" />\n";
print "</head>\n<body>\n";
print $out;
print "\n</body></html>";

You can run this script from a web browser to get news about The Oscars:


Figure 2 shows the results for The Oscars at the time of this writing.

Figure 2
Figure 2: Yahoo! Newsbar for "Oscars"

Yahoo! does place a limit on the number of queries you can perform per day--around 5,000--so if you're including this as part of a website, you'd probably want to cache the results locally and run the script on a schedule. You could make this happen by tweaking the print output at the bottom of the script and running the script with chron or the Windows Scheduler at regular intervals. It would look something like this:

perl yahoo_newsbar.pl Oscars > newsbar_include.html

If I ran a site about The Academy Awards, this could be an easy way to add a bit of visually interesting news.

Yahoo! for Developers

Along with their API, Yahoo! is opening several official channels of communication with developers. They've started mailing lists for each API section: News, Images, Video, etc. They've also started a Web Services weblog for announcements and a wiki for public collaboration and tracking new applications. Yahoo! is finally appealing to my inner geek, and I'm looking forward to seeing what people do with their API. When I asked Mr. Zawodony which potential applications he was most looking forward to, he said, "The ones we haven't anticipated."

Paul Bausch is a co-creator of the weblog software Blogger, maintains a directory of Oregon-based weblogs at ORblogs.com, and is the author of the forthcoming Yahoo! Hacks.

Return to the O'Reilly Network.

Copyright © 2009 O'Reilly Media, Inc.