Types of Web Analytics Software: Chapter 10.2 - Website Optimization

by Andrew B. King

What gets measured gets managed.

—Peter Drucker

Without quantifiable metrics, website optimization (WSO) is a guessing game. But with hundreds of billions of e-commerce dollars at stake, most companies cannot afford to guess.

Website Optimization book cover

This excerpt is from Chapter 10: Website Optimization Metrics from Website Optimization. Is your site easy to find, simple to navigate, and enticing enough to convert prospects into buyers? Website Optimization shows you how. It reveals a comprehensive set of techniques to improve your site's performance by boosting search engine visibility for more traffic, increasing conversion rates to maximize leads and profits, revving up site speed to retain users, and measuring your site's effectiveness (before and after these changes) with best-practice metrics and tools.

buy button

Types of Web Analytics Software

There are two common types of analytics technologies to be aware of: web server log analysis and JavaScript page tagging. Individually, these methods each have their pros and cons. Taken together, they provide a holistic view of what is going on with your website from both server-side and client-side perspectives. A brief overview of each method, as well as a hybrid of the two, follows.

You'll learn how you can use these methods to track the success metrics outlined earlier with the recommended tools. You'll also read about two more advanced analytics tools, namely Google Website Optimizer and the user experience tool WebEffective from Keynote Systems.

Web Server Log Analysis

Web servers record every single HTTP transaction in text files known as logs. This includes every image, Cascading Style Sheet (CSS), JavaScript, HTML page, and any other file served to your visitors.

Because this data is already available on the web server, there is no need to modify your pages to start receiving data. Thus, there is no decrease in performance. You need only install a log analysis tool, configure it (consolidate browser IDs, eliminate internal traffic, exclude bots, etc.), and point it to the logs. However, installation is not as simple as in JavaScript page tagging, and is typically performed by a system administrator.

Webalizer, AWStats, and Analog are three of the commonly supplied logfile analysis tools. They are all free. Because server logs are usually in a standard format, they will work across all platforms and web servers. For more details on these packages, see the following sites:

AWStats, for example, breaks out humans from search robots in its summary traffic report (see Figure 10.3, “AWStats breaking out viewed and not viewed traffic”). The behavior of web robots, spiders, and crawlers is something that JavaScript-based analytics tools cannot show you, because search engines cannot execute JavaScript and send data back to the tracking server.

Server hits and an accurate count of bytes sent are also information that you will not get from a JavaScript-based solution. These two metrics can help you benchmark the performance of your web server. Log analyzers can also show you reports on 404s (Page Not Found errors) along with the referring page to help you track down broken links. You can also find this type of information through Google Webmaster Central's Sitemaps tool, at http://www.google.com/webmasters/.

Figure 10.3. AWStats breaking out viewed and not viewed traffic

AWStats breaking out viewed and not viewed traffic

The drawback to log analyzers is that they will not see transactions that do not take place on the server, such as interaction with DHTML on the page, or web pages that are cached by the user's web browser. For busy sites that see heavy traffic, logfiles can become huge over a short period of time. For these reasons, as well as the desire to centralize and outsource analytic data services, JavaScript page tagging was born.

JavaScript Page Tagging

Analytics tools based on JavaScript page tagging are popular for their ease of installation and for their ability to track cached page views and non-HTTP interactions within Flash movies, DHTML, or Ajax, assuming the analytics code is in the cached page.

The technology works by adding a bit of JavaScript to a page, or tagging it. When a user loads the page in a browser, the code is executed and sends a 1 x 1-pixel transparent GIF image back to a web server with information collected about the page view.

Installation is easy and is typically a cut-and-paste operation. To install Google Analytics, a developer need only include this bit of code on every page in the site by means of a site-wide footer:

<script type="text/javascript" src="http://www.google-analytics.com/ga.js"></script>

<script type="text/javascript">  var pageTracker = _gat._getTracker("UA-xxxxxx-x");
pageTracker._initData();   pageTracker._trackPageview();</script>

Unlike with log analysis tools, you can also track JavaScript or Flash events caused by widgets that don't necessarily call the server. In Google Analytics, you can do this through the trackPageview function.

Say we want to count a page view every time a user clicks the Next button in a photo gallery without refreshing the page. We could write the following bit of JavaScript:

<input type="button" onclick="getNextPhoto(); pageTracker._trackPageview('/photo-
gallery/next/');" value="Next" />

Now when users interact with our photo gallery, even though the page does not fully refresh, we will record a page view. You can find more instructions on this level of tagging at http://www.google.com/support/googleanalytics/bin/answer.py?answer=55597&topic=10981.

JavaScript tagging can also provide more information about the user's browsing capabilities, whereas log analyzers rely on the User-Agent header sent with the browser to gather insight in this area (which can be and sometimes is forged, especially in Firefox and Opera):

User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.
4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30)

JavaScript-based analytics solutions can give you information about screen size, color support, and installed browser plug-ins (e.g., Flash, Java) in addition to browser and operating system types. Unlike server-side logfile analysis, JavaScript tagging incurs a performance hit from both downloading and executing the JavaScript and the overhead of the image beacon. Improperly coded, external resources can grind your pages to a halt if the tracking server goes down or becomes unresponsive.

Multivariate testing with Google Website Optimizer

Google's Website Optimizer is a free A/B testing tool that allows developers to run controlled experiments. Released in late 2006, Website Optimizer has revolutionized the testing of multiple variations to optimize conversion rates. Now there is no need to purchase specialized software run by white-coated lab technicians to run multivariate tests. Website Optimizer packages the mathematics of statistical power, sample size, and random variation into an intuitive integrated system. the section called “Multivariate testing with Google Website Optimizer” shows an overview of how Website Optimizer works.

You can use Website Optimizer as an A/B split testing service for sites with lower page traffic (less than 1,000 page views per week) that want to test alternatives, or as a multivariate testing platform for busier sites that want to test multiple content changes simultaneously.

Using Google's interface, developers take the following steps to run a multivariate test:

  1. Choose the elements to test.

  2. Set up the experiment by inserting JavaScript in various places in the target pages.

  3. Launch the variations.

  4. Analyze the results.

Step 2 uses JavaScript to randomly display and monitor content variations. A header script, page control script, and tracking script do the heavy lifting. The greater the number of combinations, the more traffic or time will be needed to have enough statistical power to achieve a significant result. Google Website Optimizer is a great way to try out different ideas to maximize your conversion rates. For more information about Website Optimizer, see http://www.google.com/websiteoptimizer/.

Figure 10.4. Multivariate testing with Google Website Optimizer

Multivariate testing with Google Website Optimizer

Hybrid Analytics Systems

By combining logfile analysis with client-side tracking, you can harness the best features of both. UsaProxy is a hybrid analytics system developed by University of Munich researchers that can track both client-side interaction and HTTP activity. [162]

The UsaProxy architecture is HTTP-based. It has a proxy server that automatically injects JavaScript into web pages to track client-side behavior. It also improves logfile functionality by recording both HTTP requests as well as client-side activity such as mouse movements and document object model (DOM) interaction within the same logfile. Here is a sample from an actual logfile showing mousemove and keypress activity:

127.0.0.1 2007-12-02,23:04:46 httptraffic url=http://mail.google.com/mail/ sd=624
127.0.0.1 2008-00-02,23:04:48 sd=627 sid=Adn1KR0Hr8VT event=load size=0x0
127.0.0.1 null httptraffic url=http://mail.google.com/mail/?ui=2 ik=ae8caaf240
view=cbj sd=632
127.0.0.1 2008-00-02,23:04:48 sd=627 sid=Adn1KR0Hr8VT event=load size=300x150
127.0.0.1 2007-12-02,23:05:02 httptraffic url=http://mail.google.com/mail/ sd=649
127.0.0.1 2008-00-02,23:05:06 sd=627 sid=Adn1KR0Hr8VT event=mousemove offset=75,27
coord=84,54 dom=abaaaaaaaaaababcaaa
127.0.0.1 2008-00-02,23:06:24 sd=627 sid=Adn1KR0Hr8VT event=keypress key=shift+H
127.0.0.1 2008-00-02,23:06:25 sd=627 sid=Adn1KR0Hr8VT event=keypress key=m

The combined logfile allows finer-grained analysis, timings, and overlays of clientside interaction on web pages (see Figure 10.5, “Mouse trails recorded by an HTTP proxy overlaid onto a screenshot”).

Figure 10.5. Mouse trails recorded by an HTTP proxy overlaid onto a screenshot

Mouse trails recorded by an HTTP proxy overlaid onto a screenshot

The advantage to the HTTP proxy technique is that there is no need to tag pages. One disadvantage is that HTTP compression is disabled while gathering data. You should run UsaProxy only for logging on a live website when site visitors have agreed to it, because the high level of detail raises some privacy concerns, such as login identifiers and passwords. The UsaProxy software is available at http://fnuked.de/usaproxy/.

User Experience Testing Software

What if you want to track metrics across multiple sites, including those of your competitors? Or compare task completion success to user attitudes? That's where User Experience (UX) testing software comes into play. UX testing was once the exclusive domain of usability labs. Now UX software semiautomates user experience testing with specialized software for running usability tests and capturing results. Keynote Systems' WebEffective software is one such UX testing platform (see Figure 10.6, “Keynote Systems' WebEffective output”).

Figure 10.6. Keynote Systems' WebEffective output

Keynote Systems' WebEffective output

Available under license or as a service, WebEffective is a flexible platform for conducting in-depth user experience and market research studies on individual sites or across an entire industry. WebEffective uses a small ActiveX component or a proxy server to track user behavior and gather input during the test. Detailed clickstream data is available only through Internet Explorer and the ActiveX control, but you can use WebEffective with all other browsers for task-based testing. Researchers design and deploy tests that include screening panelists and running tasks on one or more sites, while at the same time gathering detailed information on user activity and success rates. The tool provides a window into the real-world attitudes, behaviors, and intentions of users. For instance, users tend to overestimate success rates when compared to actual drop-off rates (see Figure 10.7, “Conversion funnel with drop-off rates and comments”).

The significance of Figure 10.7, “Conversion funnel with drop-off rates and comments” is that 70% of testers said they completed the task, but only 20% of those actually completed the task as it was designed to be completed.

The software provides robust reporting tools, showing success rates, browsing time, page views, stay and load times, and other metrics. More important, it integrates user feedback with results (shown in Figure 10.7, “Conversion funnel with drop-off rates and comments”). So, not only do you find out what happened, but you can also learn why it happened. Figure 10.8, “Club Med findings: booking process” shows some sample results from a comparison between the Club Med and Beaches websites.

Figure 10.7. Conversion funnel with drop-off rates and comments

Conversion funnel with drop-off rates and comments

Figure 10.8. Club Med findings: booking process

Club Med findings: booking process

This kind of integrated approach to usability testing can boost conversion rates significantly without the need for an expensive usability laboratory. Think of it as a global usability lab without walls.

If you enjoyed this excerpt, buy a copy of Website Optimization.