O'Reilly Hacks
oreilly.comO'Reilly NetworkSafari BookshelfConferences Sign In/My Account | View Cart   
Book List Learning Lab PDFs O'Reilly Gear Newsletters Press Room Jobs  


 
Buy the book!
Yahoo! Hacks
By Paul Bausch
October 2005
More Info

HACK
#87
Add Links to a Block of Text Automatically
Auto-Linker uses the Yahoo! API to add relevant links to keywords in any text
The Code
[Discuss (0) | Link to this hack]

The Code

The code for Auto-Links is PHP5, so save the following code to a file called autolinker.php5:

	<html>
	<body>
	<?php
	$text = ( isset($_POST['text']) ) ? $_POST['text'] : '';
	$rel = ( isset($_POST['rel']) ) ? $_POST['rel'] : '';
	$engine = ( isset($_POST['engine']) ) ? $_POST['engine'] : '';
	$text = stripslashes($text);
	
	$maxLength = 2000;
	if ( strlen($text) >= $maxLength ) {
		$text = substr($text, 0, $maxLength - 1) . '…';
	}

	echo '<h1>Auto-Linker</h1>';

	if ($text == '')
	{
	?>
	<p>This tool uses the Yahoo API to link significant words and phrases from a
	text you provide.</p>

	<form action="autolinker.php5" method="post"><div>
	<textarea style="font-size: 90%" name="text"
				cols="58" rows="8"></textarea><br />

	Relation:
	<select name="rel">
		   <option value="">[Default]</option>
		   <option value="nofollow">Nofollow</option>
	</select>

	&nbsp; Search Engine:
	<select name="engine">
		   <option value="yahoo">Yahoo</option>
 		   <option value="google">Google</option>
		   </select>

	<input type="submit" value="Submit" />
	</div></form>
	<?
	}

This code makes sure that if the text parameter has not been submitted, the script presents a <textarea> to be filled out. The user can also choose between links returned from Yahoo! or a Google web search.

Once the text is submitted to the script, the actual auto-linking takes place. Here is the else clause that triggers auto-linking:

	else
	{
		$sLinked = autoLink($text, $rel, $engine);
		echo '<p style="font-size: 105%;">' .
				$sLinked . '</p>'; 
		showCopyable($sLinked); 
		echo '<p><a href="autolinker.php5">[Auto-Linker Home]</a></p>';
	}

The showCopyable function just inserts a <textarea> where the user can copy the HTML source of the auto-linked result. The auto-linking core is in the autoLink function:

	function autoLink($s, $rel, $engine)
	{
		$s = strip_tags($s);
		$sRel = ($rel != '') ? ' rel="' . $rel . '"' : '';

		$url = 'http://api.search.yahoo.com/ContentAnalysisService/' . 
				'V1/termExtraction.xsd?appid=insert App ID&' . 
				'context=' . urlencode($s); //*See footnote

		$dom = new domdocument;
		$dom->load($url);
		$xpath = new domxpath($dom);
		$xNodes = $xpath->query('//Result');

		$counter = 0;
		$maxLinks = 10;
		foreach ($xNodes as $xNode)
		{

			if (++$counter > $maxLinks) { break; }
			$phrase = $xNode->firstChild->data;

			$phraseUrl = '';
			
			if ($engine == 'google') {
				$phraseUrl = getTopLinkGoogle($phrase);
			}
			else {
				$phraseUrl = getTopLinkYahoo($phrase);
			}

			if ($phraseUrl != '') 
			}
				$s = preg_replace('@( ' . $phrase . ')@ei', 
				'\' <a href="' . $phraseUrl . 
				'">\' . trim(\'$1\') . \'</a>\'', 
				$s, 4);
			}
		}
		$s = str_replace("\r\n", '<br />', $s);

		return $s; 
	}

The autoLink function takes the parameters s (the whole text), rel (the link relation, either default or nofollow), and engine (the search engine, either yahoo or google). Then the function requests the list of significant phrases from the Yahoo! API. Yahoo! recommends using a POST request for longer text, but a GET request, as used here, also works. Yahoo!'s returned XML looks like this, with all lowercase values:

	<?xml version="1.0" encoding="UTF-8"?>
	<ResultSet…>
		<Result>superman</Result>
		<Result>clark kent</Result>
		<Result>super powers</Result>
	</ResultSet>

The script applies an XPath expression to this XML to iterate through all Result elements to get their values. The preg_replace function searches for the phrase (in this example, blank before the phrase is to catch words only, and we make sure the replace is case-insensitive).

The link will be taken from either Yahoo! or Google, using these two functions:

	// We grab results from Yahoo's "REST" API again
	// using PHP5's nice native XML functionality.

	function getTopLinkYahoo($q)
	{
	
		$url = 'http://api.search.yahoo.com/WebSearchService/' . 
				'V1/webSearch?appid=insert app ID&max=1&q=' . 
				urlencode($q); //*See footnote
		$dom = new domdocument;
		$dom->load($url);
		$xpath = new domxpath($dom);
		$topUrl = $xpath->query('//Url')->item(0)->firstChild->data;

		return $topUrl;
	}

	// A tiny screen-scraping function avoids the overhead
	// of Google's SOAP API; this code will need
	// adjustments whenever Google drastically changes their
	// result-page HTML.

	function getTopLinkGoogle($q)
	{
		
		error_reporting(E_ERROR | E_PARSE);
		$dom = new domdocument;
		$dom->loadHTMLFile('http://www.google.com/search?' .
				'hl=en&q=' . urlencode($q) . '&num=1');
		$xpath = new domxpath($dom);
		$s = $xpath->query(
				"//p[@class='g']/a[@href]")->item(0)->getAttribute('href'); 
		if ( ! ( strpos($s, 'spell=1') === false ) && 
				! ( strpos($s, '/search?') === false ) )
		{ 
			$s = $xpath->query( 
				"//p/a[@href]")->item(1)->getAttribute('href'); 
		}

		return $s;
	}


O'Reilly Home | Privacy Policy

© 2007 O'Reilly Media, Inc.
Website: | Customer Service: | Book issues:

All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.