A New Visualization for Web Server Logs
Pages: 1, 2
Variations
If the plot is too dense--as was the case for me--thin it down by telling Gnuplot to only use every nth data point. For example, I thinned Figure 1 by plotting every tenth point with the Gnuplot splot command:
splot "gnuplot.input" using 1:2:3 every 10
Figure 3 shows the corresponding scatter plot.

Figure 3. Thinned scatter plot
Gnuplot makes it easy to focus on a part of the plot by setting the axes ranges. Figure 4 shows a small part of the Y- and Z-axes. The almost continuous lines that run parallel to the time axis are monitoring probes that regularly request the same page. Four of them should be clearly visible. In addition, I changed the eye position.

Figure 4. Reduced Y and Z ranges showing monitoring probes
Because real people need sleep, it should be possible to make out the diurnal rhythms that rule our lives. This is evident in Figure 4. The requests are denser from 08:00 to about 17:00 and quite sparse in the early hours of the morning.
Changing the viewing angle can give you a new point of view. Gnuplot lets you do it in one of two ways: with the command line set view or interactively with a click and drag of the mouse.
The Pièce de Résistance
Because a display of 3D plots is difficult to see in three dimensions without stereoscopic glasses, I used a few more manipulations to "jitter" the image such that the depth in the picture is visible. The plot in Figure 5 is an example of this. It was easy to generate with more Gnuplot commands followed by GIF animation with ImageMagick.

Figure 5. A animated GIF of the scatter plot that hints at the 3D structure
Further Work
With Gnuplot 4.2, which is still in beta, it is now possible to draw scatter plots in glorious color. Initial tests show that using color for the status code dimension makes the plots even more informative. Stay tuned.
Conclusion
Though the 3D plots present no hard numbers or trend lines, the scatter plot as described and illustrated above may give a more intuitive view of web server requests. Especially when diagnosing problems, this alternative way of presenting logfile data can be more useful than the charts and reports of a standard log analyzer tool.
Code Listings
The Perl script:
#
# prepare-for-gnuplot.pl: convert access log files to gnuplot input
# Raju Varghese. 2007-02-03
use strict;
my $tempFilename = "/tmp/temp.dat";
my $ipListFilename = "/tmp/iplist.dat";
my $urlListFilename = "/tmp/urllist.dat";
my (%ipList, %urlList);
sub ip2int {
my ($ip) = @_;
my @ipOctet = split (/\./, $ip);
my $n = 0;
foreach (@ipOctet) {
$n = $n*256 + $_;
}
return $n;
}
# prepare temp file to store log lines temporarily
open (TEMP, ">$tempFilename");
# reads log lines from stdin or files specified on command line
while (<>) {
chomp;
my ($ip, undef, undef, $time, undef, undef, $url, undef) = split;
$time =~ s/\[//;
next if ($url =~ /(gif|jpg|png|js|css)$/);
print TEMP "$time $ip $url $sc\n";
$ipList{$ip}++;
$urlList{$url}++;
}
# process IP addresses
my @sortedIpList = sort {ip2int($a) <=> ip2int($b)} keys %ipList;
my $n = 0;
open (IPLIST, ">$ipListFilename");
foreach (@sortedIpList) {
++$n;
print IPLIST "$n $ipList{$_} $_\n";
$ipList{$_} = $n;
}
close (IPLIST);
# process URLs
my @sortedUrlList = sort {$urlList {$b} <=> $urlList {$a}} keys %urlList;
$n = 0;
open (URLLIST, ">$urlListFilename");
foreach (@sortedUrlList) {
++$n;
print URLLIST "$n $urlList{$_} $_\n";
$urlList{$_} = $n;
}
close (URLLIST);
close (TEMP); open (TEMP, $tempFilename);
while (<TEMP>) {
chomp;
my ($time, $ip, $url, $sc) = split;
print "$time $ipList{$ip} $urlList{$url} $sc\n";
}
close (TEMP);
Raju Varghese has a Bachelors in Electrical Engineering from BITS, Pilani (India) and a Masters in Computer Science from the University of Texas, San Antonio.
Return to SysAdmin.com.
Showing messages 1 through 23 of 23.
-
For Sale Brand New Unlocked Nokia N95,Sidekick Lrg 3 Apple Iphone
2007-07-07 05:19:59 frank01 [View]
-
labeling
2007-04-14 10:59:26 jamonapi [View]
I think the graphic would benefit by better separation of the labels (make them bold), and also info about the axis that is associated with them.
For example the following axis names could appear in bold in the graph.
Time (one day): X-Axis
IP Address: Y-Axis
URL: Z-Axis
-
Nice Article
2007-04-13 15:13:07 jamonapi [View]
I agree that better visualization tools should be made available. Most just use the same tired pie charts and bar charts.
I see you posted an example of adding color to show an extra dimension of http status code. That makes 4 dimensions. An interesting 5th dimension would be to have multiple graphics similar to what you havex, but break them up by response time. This would be a 'small visualization'.
For example repeat your graphics for pages that take from 0 to 1 seconds, 2 to 3 seconds, 3 t0 4 seconds, 4 to 5 seconds, 5 to 10 seconds, > 10 seconds (or something similar). Patterns may emerge along this dimension too like 404 errors during long page executions. Or slowness based on time of day.
steve http://www.jamonapi.com -
Thanks!
2007-04-14 02:00:10 rajuvarghese [View]
Using different animation frames for the 5th dimension is a good idea. So far, I have used frames for making the 3D structure visible. Of course, one could use one of the other dimensions for the response time. For example, instead of the IP address (y-axis) or the content (z-axis) one could use the response time.
Raju -
Thanks!
2007-04-14 10:47:11 jamonapi [View]
I am just wondering how you could go beyond 4 dimensions with this approach. There may be other ways too.
Note for those interested there is a follow up article on this approach.
http://www.oreillynet.com/pub/a/sysadmin/2007/02/02/3d-logfile-visualization.html
-
Better ways to see 3D
2007-02-12 17:30:13 dberkholz [View]
Try creating surfaces rather than dots. Also, if you color them by height in the Z-direction, it's a lot easier to see than with a single color.
Also, you could try creating true stereo images to better see the 3D nature. Simply rotate one image by 7 degrees relative to the other. Depending on which one is on the left, you'll get either walleyed or crosseyed stereo. -
Better ways to see 3D
2007-02-12 17:32:21 dberkholz [View]
Forgot to mention ... once you've made the stereo images, separate them by the distance between your eyes for optimal viewing. If they overlap, shrink them. -
Surface plot, viewing 3D
2007-02-12 22:53:48 rajuvarghese [View]
Thanks dberkholz for your two suggestions. I did not create a surface plot because the Z
value is not continuous. For a small number of points it may look good. With
larger numbers it will not display well. Color, as noted in the article, will
be put to good use for visualizing another dimension in a forthcoming article.
The jittered plot was my attempt at displaying 3D without red-blue glasses or specialized equipment.
Since color will be used later for displaying another parameter I cannot use it for left and right images. Let me know if you had something else in mind with your suggestion.
-
Surface plot, viewing 3D
2007-02-13 18:52:24 Nick_3D [View]
Gentlemen,
Surface 3D visualization gives much better perception in any case as correctly mentioned by dberkholz. It is easy to understand because real objects we used to deal with in life look like surfaces rather than clouds. It is also as easy as the fact that line 2D plot look more understandable as compared to spread points. As many as several thousands points can be nicely visualized as 3D surface. Colors help a lot too as correctly mentioned by dberkholz. For more that 1 MLN points the best is still surface 3D visualization but enhanced with real-time zooming (all directions) and flyby, that is normally no problem with reasonable dedicated multidimensional graphics software package.
For better 3D screen shots you can see this for example:
http://www.sciencegl.com/Stock_market/Stock_market.htm
What matters the most I think is a interactive way to read back the data of the interest in any point of the surfaces and at any time of navigation. Read out such as point position following mouse (XYZ), cross-section (cut), difference between markers, etc.
Something like this
http://www.sciencegl.com/Help_stock/Stock_Market_3D.html
Thank you for your attention.
-
Surface plot, viewing 3D
2007-02-15 08:20:29 rajuvarghese [View]
Surfaces do look better than clouds but my hesitation is due to two things:
- joining up points to form a line (2D case) and making a surface (3D case) is ok for continuous functions/data. In this case, the adjacent IP addresses may have little to do with each other and one cannot expect a smooth transition between any two points. This can apply to one of more of the axes (IP address and content). Interpolating a surface may, therefore, be fake.
- I have another yet dimension that I would like to bring into the picture and color is the only thing left at my disposal
But you do bring up an important point: interactivity. Plots such as the ones in the article are dead objects that cannot tell you any more about the displayed data. A good interactive tool would let the user mouse over the points and describe each data point. Excel charts can do this as, I am sure, many others too. Excel has many other well-known limitations which precluded its use for decent-sized log files. I would love to try out my data with a proper 3D tool such as the one you linked to.
If you have such a tool and could load it up with similar data, I would be grateful if you put up a screenshot to show that it does indeed look good.
-
Surface plot, viewing 3D
2007-02-15 18:04:44 Nick_3D [View]
Yes, your hesitation is correct in some respect. True spread data has little or no sense in surface representation. The things are different if you have at least 2 continuous directions such as Z (volume or hits in your case) and X (time in your case) then surface works perfectly fine as it is sown in links above. 3D surf is constructed as bands or 3D tapes, interpolation only in continuous directions. Feel free to drop a line to my friends at www.ScienceGL.com to get demo, evaluation whatever. Contact Alex.
Network graphics (http://www.sciencegl.com/)
-
Surface plot, viewing 3D
2007-02-13 18:48:31 Nick_3D [View]
Gentlemen,
Surface 3D visualization gives much better perception in any case as correctly mentioned by dberkholz. It is easy to understand because real objects we uset to seal with in life looks like surfaces rather than clouds. As many as several thousands points look still good in as 3D surface. Colors help a lot too as correctly mentioned by dberkholz. For more that 1 MLN points the best is still surface 3D visualization but enchanced with real-time zooming (all directions) and flyby, that is normally no problem with reasonable dedicated multidimentioanal graphics software package.
For better 3D screen shots you can see this for example:
http://www.sciencegl.com/Stock_market/Stock_market.htm
What matters the most I think is a interactive way to read back the data of the interest in any point of the surfaces and at any time of navigation. Read out such as point position folowing mouse (XYZ), crossection, difference between markers, etc.
Something like this
http://www.sciencegl.com/Help_stock/Stock_Market_3D.html
Thank you for your attention.
-
An example to see...
2007-02-11 23:29:33 Joe [View]
My friends at TrenchMice (http://www.trenchmice.com/)put up an example of what this looks from the flood that rolled in due to a front-page article this weekend. Pretty interesting.
http://www.cogitooptimus.com/2007/02/11/wow-we-made-it/
There is an edit that needs to be made in the script - the line that reads:
my ($ip, undef, undef, $time, undef, undef, $url, undef) = split;
might be changed to
my ($ip, undef, undef, $time, undef, undef, $url, undef, $sc, undef) = split; -
A correction to my correction
2007-02-11 23:48:05 rajuvarghese [View]
Aaargh! I messed up the tags in my reply. It should have been...
A $sc got lost in the displayed code. Thanks Joe for the correction. My original code was slightly different but it won't make difference from Joe's correction above.
my ($ip, undef, undef, $time, undef, undef, $url, undef, $sc) = split;
-
Error in the perl script
2007-02-11 13:18:40 Jeremiah Foster |
[View]
I wonder if I have done something wrong. When copying the script and running it I get the error that $sc must be defined. I looked through the original code and I do not see it defined.
Am I missing something? Or is this a bug?
Jeremiah -
Corrected
2007-02-11 23:49:26 rajuvarghese [View]
Yes, there is something missing in the code. Please see Joe's message above and my reply to it.
-
Data Visualization
2007-02-09 20:59:41 raffaelmarty [View]
I love the log analysis example. I am a huge supporter of visual analysis with an emphasis on real-world use-cases. Nothing worse than a great visualization without a real use-case. More visualization use-cases can be found at
http://secviz.org
I encourage everyone to contribute content there and increase the collection of real-world use-cases for others can use!
-
Works like a charm
2007-02-09 07:20:51 iconara [View]
Very good idea, and an excellent article. I adapted your scripts to work with my IIS-generated log files:
while ( <> ) {
# IIS log files have comments which start with "#", skip those
next if /^#/;
chomp;
# this is the IIS log format, it may vary, but it prints the format as a comment at the beginning of the file
my (undef, $time, $ip, undef, undef, undef, undef, undef, undef, $url, undef, $sc) = split;
next if ($url =~ /(gif|jpg|png|js|css)$/);
// no need to reformat the time so I removed that line
print TEMP "$time $ip $url $sc\n";
$ipList{$ip}++;
$urlList{$url}++;
}
And I also added this to the gnuplot-commands to make it create a png (add before "splot ..."):
set output "plot.png"
set terminal png
-
Forgot the time format
2007-02-10 03:28:03 iconara [View]
I forgot that I also changed the time format (since IIS has date and time as two separate items in the log file). You need to change the line begining with "set timefmt" to:
set timefmt "%H:%M:%S"
-
Works like a charm
2007-02-09 11:01:13 rajuvarghese [View]
Good work extending the script for IIS logs. The output to png files was planned for the next part of the article. The plot is going to be colorful as well. Gnuplot 4.2 has reached RC4 and release is imminent. Glad you liked the article. -
Works like a charm
2007-02-09 12:58:44 bjelkeman [View]
Interesting. I have been working with web traffic analysis tools more or less since they came out and I have been saying that the current tools are very poor at giving good overviews of traffic. Mapping the traffic in 3d, there are several ways to do this of course, is a great way to increase the amount of information you can show in a graph. Not a lot of people think like me apperently, as essentially every developer I have talked to have essentially shrugged their shoulders at the suggesions. -
I agree: sysadmins have no graphical tools
2007-02-09 15:20:25 rajuvarghese [View]
As I mentioned in the article, standard analysis tools seem to cater more to publishers/editors than to the sysadmins. Though web servers have been around for over 13 years, one does not see much progress on that front. Further, none of the standard tools that I am familiar with have 3D views of data. I believe that interactive 3D charts could make the data flood less intimidating for both publishers and sysadmins. -
I agree: sysadmins have no graphical tools
2007-02-15 13:36:31 ksfiles [View]
If you like 3D visualization of network traffic, you should definitely check out Merit's Flamingo tool. It provides lots of interesting ideas.
http://flamingo.merit.edu/gallery.html
Thanks,
--kirby










For more information Regarding our product, Kindly Contact us at (frankfreer@msn.com), We look forward in receiving your Order also to give you the most competent services as we are Using this Medium to Look For Buyers Of Various Electronics Product.
PRODUCT LIST:
TOM TOM GO 500............ $150 USD
TOM TOM GO 510............ $160 USD
TOM TOM GO 700............ $190 USD
TOM TOM GO 300............ $140 USD
TOM TOM RIDER..............$190 USD
TOM TOM GO 910............ $200 USD
SIDEKICK:
SIDEKICK III----- $130
SIDEKICK II ----- $110
JUICY COUTURE SIDEKICK I----- $100
SIDEKICK II MISTER CARTOON--$130
JUICY COUTURE SIDEKICK II----- $130
PLASTATIONS:
GARMIN 660........$200
GARMIN 396........$130
Playstation 1.....$110
Playstation 2.....$120
Playstation 3.....$250
Sony PSP Value Pack--$100
Nintendo Wii --- $155
XBOX GAMES:
Xbox 360 Core System ----- $150
Xbox 360 Prenium pack --$180
Xbox 360 Platinum Bundle Console ----- $145
APPLE IPODS:
Apple iPod 30GB (Video) New! -- $105
Apple iPod 60GB (Video) New! -- $120
Apple iPod Nano 2GB New! -- $80
Apple iPod Nano 4GB New! -- $90
Apple iPod Shuffle 512 MB -- $65
Apple iPod Shuffle 1 GB -- $70
Apple iPod Mini 4 GB -- $75
Apple iPod Mini 6 GB -- $90
Apple iPod Photo 30 GB -- $110
Apple iPod U2 SE 20 GB -- $115
Apple iPod Photo 60 GB -- $130
Apple iPod 20 GB -- $90
Apple ipod 80 GB -- $150
Apple iPod 30GB (Video) New! ===$100.95
Apple iPod 60GB (Video) New! ===$130.00
Apple iPod Nano 2GB New!========$75.00
Apple iPod Nano 4GB New!========$100.50
Apple iPod Shuffle 512MB =======$55.75
Apple iPod Shuffle 1GB =========$50.00
Apple iPod Mini 4GB ============$75.55
Apple iPod Mini 6GB ============$90.50
Apple iPod Photo 30GB ==========$100.00
Apple iPod U2 SE 20GB===========$105.95
Apple iPod Photo 60GB ==========$125.50
Apple iPod 20 GB================$90.00
Apple iPhone 4GB================$120
Apple iPhone 8GB================$150
Dopod 838 Pro ( 838pro )========$300
NEXTEL:
NEXTEL i60c ........$50
NEXTEL i90c ........$95
NEXTEL i95cl........$130
NEXTEL i30sx........$55
NEXTEL i88s ........$70
NEXTEL i35s ........$30
NEXTEL i58sr........$75
NEXTEL i2000........$60
NEXTEL i830 ........$120
NEXTEL i860 ........$130
NEXTEL i930 ........$110
NEXTEL i860 ........$100
NEXTEL 1930 ........$120
NEXTEL i870 ........$140
NEXTEL i450 ........$90
NEXTEL 1860 ........$110
Nokia N73===$150
Nokia N75===$170
Nokia N80===$180
Nokia N90===$180
Nokia N91===$180
Nokia N92===$200
Nokia N93===$200
Nokia N93i===$230
Nokia N95===$250
Nokia N-Gage===$175
Nokia N-Gage QD===$150
Nokia 8800 sirocco==$250
Black Berry 8800 ..............$200
Black Berry Pearl 8100 ........$170
Black berry 7130g .............$220
Black berry 7130c .............$140
Black berry 7130v .............$120
Black berry 8707V .............$270
Black berry 8700c .............$240
Black berry 7100x .............$180
Black berry 7100t .............$160
Black berry 7100v .............$220
Black berry 7290* .............$220
Black berry 7730 ..............$240
Black Berry 7510...............$270
I-MATE K-JAM.........................$200USD
i-MATE JASJAR........................$220USD
ETEN G500............................$190USD
Eten M500............................$170USD
ETEN M600............................$190USD
Qtek 9600.............................$220usd
Qtek 8310.............................$150usd
Qtek 9100.............................$200usd
Qtek 8020.............................$170usd
Treo 750.............................$200 USD
Treo 700w............................$180USD
Treo 650.............................$140USD
Treo 700p............................$180USD
HEXA TELECOM LTD Offers Brand New Products With Complete Accesories and Product Comes with 1 Full Year International Warranty From Manufacturer, Products will be delivered to your Door Step via Fedex courier service.
kindly get back to us with your order details and your contact phone number so we can proceed with your order.
Reply To: frankfreer@msn.com
Regards.
Frank Freer.