Eight Search Engine "C" Changesby Tara Calishain, author of Google Hacks
Hello. My name is Tara Calishain and I'm a search engine addict.
Not just mildly interested. Not just pretty good at finding things. I mean obsessed. I mean hours laboriously deconstructing Google result URLs. I mean all kinds of wacky experiments (with accompanying maniacal laughter) whenever AltaVista changes its syntaxes.
Ever since I wrote the first edition of the Official Netscape Guide to Internet Research in 1996, I've been fascinated with how search engines work. What makes them go, and what makes them go faster? Over time, that fascination has expanded to specialty search engines, databases, and other online data collections. Since 1998, I've tried to cover the world of online search engines with a weekly newsletter called ResearchBuzz, but alas, I can only scrape the surface.
Now is a wonderful time to be a search engine addict. More and more state and country sites are putting databases of material online--everything from professional license information to restaurant health-inspection scores. Extensive genealogy collections are appearing from both professional organizations and amateur enthusiasts. And Google, arguably the Internet's most popular search engine, continues to push the envelope with a variety of new offerings.
Busy? Absolutely. Exciting? You bet. But uncertain as well. Four or five years ago, searching was fairly straightforward. The Internet was useful, not yet ubiquitous. Search engines were still evolving and experimenting. There was a great surge of growth. Now the growth is not so extensive; surfers as a whole have developed a sense of what they expect when they visit a search engine. Instead, search engine companies have several problems to face in order to develop an infrastructure that's both sustainable (from a financial standpoint) and acceptable (from a user's standpoint).
There are lines from The Tempest that always remind me of the Internet:
Nothing of him that doth fade But doth suffer a sea-change Into something rich and strange.
Financial, cultural, and technological forces are all combining to push against search engines and turn them into something rich and strange indeed. But since computers and water don't mix, why don't we call them "C changes"? Eight of them have been occupying my thoughts a lot lately.
As I noted, more and more collections are going online. But while many of them have their own site-search engines, they're sometimes hard to find via traditional full-text search engines like Google. It may be because the database consists of dynamic content, which is difficult to index because it's password-protected, or for another reason. Search engines are going to have to determine how to get to this content, and how to present it to the searcher.
Once upon a time, search engines were either technology showcases (that's occasionally still the case) or they were supported with banner ads. Banner ads don't do the trick anymore, however; click-through rates have dropped through the floor and instead, ad-blockers are doing a booming business. Search engines have to come up with a way to generate revenue, and pronto.
How are they going to do it? Will it be via pay-for- inclusion (PFI) programs, or sponsored result listings that appear at the top of search results? Will it be the pay-per-click (PPC) that's been so good to Overture, or will it be the AdWords that Google's made available for even the smallest Internet advertiser? Or perhaps search engines will consider offering subscription services that deal directly with the searcher. (Yahoo does offer several "premium" services, but for the most part, the services don't deal with searching per se.)
Despite the ten years that have passed since the Web first got rolling, search engines are still indexing only a fraction of the available pages. That's not to say search engines haven't advanced greatly; they have. But the Web is growing faster than search engines can keep up. Will the growth curve ever flatten? Even if search engines could index the entire Internet, would that be useful or appropriate? Does there need to be a better way to separate out the signal from the noise before search engines try to encompass more of the Web?
Different countries have different levels of access. Germany has already demanded that Google remove certain items from its index (you can see an article about this at PCWorld.com, and it's been noted that China has blocked huge lists of sites from their users (an article on that is available from the San Diego Union Tribune).
Is it too much to hope that one search engine could be developed that would encompass all levels of information access, or are search engines going to have to maintain several different versions of their index to please everyone?
In the early days of search engines, there was little conflict inherent in a web search. A search engine indexed materials from the Web, and searchers (hopefully) found what they were looking for. But now that search engine rankings can mean customers, and customers mean revenue, webmasters are much more aggressive about assuring that they've got a good ranking. Most of the time, webmasters go about the business of getting a good ranking honestly, but sometimes they don't.
As I see it, there's a three-way conflict between search engine users today: there's the surfer, who just wants to find useful results; the webmasters, who want to rank as high in the results as possible (including the option of paying for inclusion or paying for sponsored results); and the search engines companies, who do not want to compromise the integrity of their index, but at the same time want to generate revenue. I'll address this a little later.
Let's face it, even with all of the advances in search engines, indexing is messy. Search engines don't have a way of understanding any more than the basic parts of a web page--that is the title, that is the URL, and that is the body. A search engine can't figure out the date on a news article, or a headline, or a quote.
Considering how much progress search engines have made in other areas, I'm amazed that this low level of content understanding is still the case. What is it going to take to get search engines to understand more of the content on an indexed page? A special set of meta tags? Widespread use of XML? Something else?
Cloak and Dagger (Mostly Cloak)
An entire industry has sprung up around getting the best possible listing in search engine results. Some of the methods employed are perfectly acceptable to the search engines, while some of them are not. So search engine energy, which could be used to make ever-cooler and more beautiful search syntaxes with which I can experiment, has to go towards foiling the bad guys who want to usurp a search position to which they aren't entitled.
OK, I fudged a little on this last one. I'm referring, of course, to special syntaxes, the various ways that search engines let you narrow down your searches. Google, for example, will let you narrow your search by title, URL, or domain. However, while special syntaxes are useful to a searcher, they don't ensure the popularity of a search engine site. AltaVista, for example, has arguably more extensive special syntaxes than Google, but it doesn't mean that they're exactly raking in the visitors.
Even though syntaxes quickly narrow down search results, it seems that sometimes the average searcher doesn't know enough about them. What can search engines do to publicize special syntaxes and make them easier to use? Which hitherto-unknown syntaxes would make it even easier for users to find what they're looking for? Why are date-based searches so different between different search engines?
As I ponder these C changes, I keep in mind one thing: far from the simple interchange between site and searcher that it once was, search engines have become a hub of activity, with the addition of the interests of webmasters. Some hope to attract customers. Some hope to get their materials widely circulated. Some even want the materials of other web sites removed (due to copyright infringement, or other concerns).
The addition of this new group into the functioning concern of search engines will change their tenor considerably and generate tension. Search engines will have to consider ways to balance generating revenue (through pay-for- inclusion and other programs for webmasters) with keeping search results as relevant and useful for searchers as possible.
I predict a change in our expectations of search engines, which will slacken that tension: we may become used to the idea of search engines offering pay-for-inclusion, and have no objection to clearly labeled, sponsored results appearing at the top of a page (the latter appears to have already happened). Banner ads and other graphics-based advertising will eventually slough off and disappear as one way to keep the good will of searchers (this has already happened at AskJeeves.com, which, with great fanfare, has stopped using banner advertising and pop-up windows).
Technical advances in search engines will still take place, but with an eye towards serving both searchers and web wranglers. Note that Google's latest offering--shopping search engine Froogle--is a great way for surfers to shop, but it's also built with a back end that makes it easy for merchants to submit data feeds. And with Google's Catalogs site, visitors can browse through a thousand mailboxes' worth of slick pages. Here, Google's got a perfect opportunity to sell catalog companies on a way to connect with interested consumers, or to provide reports that detail what the most popular searches are for services (and what the most popular catalogs are).
As someone who spends a lot more time on the searcher aspect of search engines than the webmaster aspect, I'm naturally a little concerned about these changes. But at the same time, I understand that they can't be helped. Search engines must make revenue. If they don't make money, eventually they don't exist, and then I'd spend all my time reading GoneGold.com and clicking The Really Big Button That Doesn't Do Anything. If search engines can break ground with generating revenue from webmasters, maybe eventually they'll expand to offering paid services to searchers (this topic, "Search Engine Services I Want to Pay For," is a whole other article).
In short, there's a lot happening, and it doesn't look to be slowing down any time soon. In this series of articles, I'm going to take a look at the developments, the possibilities, and the things that drive me absolutely crazy about search engines and online data collections. I'm going to rant and rave, but I hope to teach as well. Thanks.
Tara Calishain is the creator of the site, ResearchBuzz. She is an expert on Internet search engines and how they can be used effectively in business situations.
O'Reilly & Associates recently released (February 2003) Google Hacks.
Google Hack Collection, with sample hacks, is available free online.
For more information, or to order the book, click here.
Return to the Web Development DevCenter.
Copyright © 2009 O'Reilly Media, Inc.