O'Reilly Hacks
oreilly.comO'Reilly NetworkSafari BookshelfConferences Sign In/My Account | View Cart   
Book List Learning Lab PDFs O'Reilly Gear Newsletters Press Room Jobs  


 
Buy the book!
Yahoo! Hacks
By Paul Bausch
October 2005
More Info

HACK
#32
Track the Media's Attention Span over Time
Visualize media trends by counting the total number of Yahoo! News mentions of a specific phrase over a series of dates
The Code
[Discuss (0) | Link to this hack]

The Code

Though you can't limit search by date with the Yahoo! Search Web Services, this code relies on the fact that Yahoo! News search pages at the web site have stable, predictable URLs for date-specific searches. Sticking with our example, here are the relevant pieces from a URL for Apple articles from March 1:

	http://news.search.yahoo.com/news/search?va=apple&smonth=3&sday=1&emonth=3&eday=1

As you can see, the va variable holds the query, smonth and sday the start date, and emonth and eday the end date. Knowing this pattern, you can construct a query for any time period you'd like.

You'll need a couple of modules for this hack, including LWP::Simple to fetch the Yahoo! News page, and Date::Manip to work with dates. Add the following code to a file named track_news.pl:

	#!/usr/bin/perl
	# track_news.pl
	# Builds a Yahoo! News URL for every day
	# between the specified start and end dates, returning
	# the date and estimated total results as a CSV list.
	# usage: track_news.pl query="{query}" start={date} end={date}
	# where dates are of the format: yyyy-mm-dd, e.g. 2005-03-30

	use strict;
	use Date::Manip; 
	use LWP::Simple qw(!head); 
	use CGI qw/:standard/;

	# Set your unique Yahoo! Application ID 
	my $appID = "insert your app ID";

	# Get the query 
	my $query = param('query');

	# Set the News category to search tech articles 
	# Alternates: top, world, politics, entertainment, business 
	# more at: http://news.search.yahoo.com/news/advanced 
	my $category = "technology";

	# Regular Expression to check date validity 
	my $date_regex = '(\d{4})-(\d{1, 2})-(\d{1, 2})';
	
	# Make sure all arguments are passed correctly
	( param('query') and param('start') =~ /^(?:$date_regex)?$/
		and param('end') =~ /^(?:$date_regex)?$/ ) or 
		die qq{usage: track_news.pl query="{query}" start={date} end={date}\n};

	# Set timezone, parse incoming dates
	Date_Init("TZ=PST");
	my $start_date = ParseDate(param('start'));
	my $end_date = ParseDate(param('end'));

	# Print the CSV column titles
	print qq{"date","count"\n};

	# Loop through the dates
	while ($start_date <= $end_date) { 
		my $month = int UnixDate($start_date, "%m"); 
		my $day = int UnixDate($start_date, "%d");
		my $date_f = UnixDate($start_date,"%y-%m-%d");
		my $total;

		# Construct a Yahoo! News URL
		my $news_url = "http://news.search.yahoo.com/news/search?";
			$news_url .= "ei=UTF-8";
			$news_url .= "&va=$query";
			$news_url .= "&cat=$category";
			$news_url .= "&catfilt=1";
			$news_url .= "&pub=1";
			$news_url .= "&smonth=$month";
			$news_url .= "&sday=$day";
			$news_url .= "&emonth=$month";
			$news_url .= "&eday=$day";
		# Make the request
		my $news_response = get($news_url);

		# Find the number of results 
		if ($news_response =~ m!of about <strong>(.*?)</strong>!gi) {
			$total = $1;
		} else {

			$total = 0;
		}

		# Print out results
		print
			'"',
			$date_f,
			qq{","$total"\n};
		# Add a day, and continue the loop
		$start_date = DateCalc($start_date, " + 1 day");
	}


O'Reilly Home | Privacy Policy

© 2007 O'Reilly Media, Inc.
Website: | Customer Service: | Book issues:

All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.