This is the second article in a two-part technical tutorial on the deployment of the Squid web proxy cache.
In last month's article, we discussed the basics of web caching, compiled Squid from source code, and tested a basic configuration. This month, we'll add some automation and a sibling cache server to our configuration.
To test our configuration last month, we started Squid manually
/usr/local/squid/bin/squid command. Of course, on a
production server Squid must start by itself. To do this, we could
simply add the
squid command to
rc.local. Squid would
put itself in the background (its daemon mode) and run at boot time.
However, what we really want is for Squid to be running only when
we're in appropriate run levels, so we need a System-V init script.
That script will call
/usr/local/squid/bin/RunCache, a handy
startup script provided with Squid that will restart the daemon
if it happens to die. The startup script is provided in
Listing 1. We name this file
/etc/rc.d/init.d/squid and make links to it for each
# ln -s /etc/rc.d/init.d/squid /etc/rc.d/rc0.d/K16squid # ln -s /etc/rc.d/init.d/squid /etc/rc.d/rc1.d/K16squid # ln -s /etc/rc.d/init.d/squid /etc/rc.d/rc2.d/K16squid # ln -s /etc/rc.d/init.d/squid /etc/rc.d/rc3.d/S86squid # ln -s /etc/rc.d/init.d/squid /etc/rc.d/rc4.d/S86squid # ln -s /etc/rc.d/init.d/squid /etc/rc.d/rc5.d/S86squid # ln -s /etc/rc.d/init.d/squid /etc/rc.d/rc6.d/K16squid
Your init directory structure may differ depending on your distribution. With the script and links in place, Squid will start automatically when entering run levels three, four, or five, and shut down for all other run levels. You can also use the script to manually start and stop squid, using these commands:
# /etc/rc.d/init.d/squid start # /etc/rc.d/init.d/squid stop
Squid comes with a rudimentary "manager" application. It is a CGI program that produces interesting up-to-the-minute statistics on the current Squid process. To use CacheManager, you'll need to have a web server installed somewhere on your network. Apache running locally on the Squid server will be used as the example here. First, we'll add a new cgi-bin directory in the Squid hierarchy, place a copy of the CacheManager application in it, and change the ownership of the directory and file:
# mkdir /usr/local/squid/cgi-bin # cp -p /usr/local/squid/bin/cachemgr.cgi /usr/local/squid/cgi-bin # chown -R squid.squid /usr/local/squid/cgi-bin
Next, we configure Apache to see the new script directory. In
ScriptAlias /squid/cgi-bin/ "/usr/local/squid/cgi-bin/"
Finally, we set a CacheManager password in
cachemgr_passwd mypwd all
After restarting both Squid and Apache, start a browser and enter this URL:
If everything is working correctly, you should see the CacheManager
login screen. Enter the user name "manager" and the password "mypwd" (or
whatever password you selected in
squid.conf). You should
then get the CacheManager main menu. Some of the available options
will be more useful to you than others. Spend some time exploring the
output from CacheManager with a live Squid server to fully understand
Important note: Deploying the CacheManager as depicted here has security implications. Before adding this configuration to a production Squid server, review the procedures in section 9 of the Squid FAQ.
For a small company, manual configuration of browsers for use with
a proxy server may be tolerable. However, in larger enterprises,
using automatic configuration is essential. Beginning with Netscape
Navigator 2.0, automatic proxy configuration has been available
proxy.pac (pac stands for "Proxy Auto Configuration"). Netscape
defined the autoconfiguration function through the use of a special MIME type
of "pac" offered by a web server. We'll rely again on Apache to
provide the autoconfiguration file. On your Apache server, add the following
AddType application/x-ns-proxy-autoconfig .pac
This instructs Apache to send the new document type with any file
.pac. You must restart Apache to include the new
AddType directive. Next, modify the domain name in Listing 2 for your site and store the entire file
/home/httpd/html (or your
Apache server's root html directory). Finally, modify the proxy
configuration in your browser. For Netscape Communicator, use the
"Edit -> Preferences -> Advanced -> Proxies" dialog.
This time, select "Automatic Proxy Configuration" and provide the URL
proxy.pac. If you are using a local Apache server on
Linux, the URL is:
Browsing should work as before. Using the autoconfigure capability
allows you to manage all browsers' proxy configurations simply by
proxy.pac file on your web server, freeing
you from manually configuring browsers. For the official word on
browser autoconfiguration, see the Navigator
Proxy Auto-Config File Format page. There you'll find detailed
information on how to configure browsers to selectively use the proxy
as appropriate, or to select among multiple proxies.
For small operations in a single location, one cache server may be sufficient. For more complicated scenarios, Squid and many other cache servers allow communictions between caches. With this capability you can deploy a mesh of cache servers, where parent and sibling caches share their content with one another using the Internet Cache Protocol. This can be useful for load balancing and redundancy.
It can also be used to set up a distributed cache infrastructure, where remote offices with slow network connections need their own local cache. Traffic on the slow network connections can be reduced by creating a parent cache at the Internet connection and child caches at each remote office. You can also join an existing mesh of caches on the Internet if appropriate (see section four of the Squid FAQ).
For our example configuration, we'll set up a single sibling server
proxy2 in addition to
proxy1, already configured.
We'll assume that the Domain Name System (DNS) configuration will handle resolution of a
single cache server name to the two physical servers. The hardware
proxy2 should be at least as capable as
proxy1 but there's no requirement that they be identical.
proxy2, the two servers can be made siblings
by making the following Squid configuration changes (with the
appropriate domain names):
icp_access allow all cache_peer proxy2.my.domain sibling 3128 3130
icp_access allow all cache_peer proxy1.my.domain sibling 3128 3130
After a restart of both Squid processes, the servers should begin checking each other's caches before going to origin servers on the Internet. You should see new output in the access log relating to the sibling server.
On an active proxy server, access logs can get extremely large. If allowed to grow unchecked, they can become difficult to work with. Worse, they could quickly fill the partition holding the log directory. Implementing a scheme to rotate logs frequently will help to prevent this scenario.
Squid is capable of doing its own log rotations. Though you could
use other facilities to handle it, a single signal to the running
Squid process will do the rotation for you neatly and cleanly. To
enable it, first choose the number of old logs you wish to keep and
enter it in
After restarting Squid, you can initiate the rotation with this command:
# /usr/local/squid/bin/squid -k rotate
By putting this command into the daily cron configuration (or in root's crontab) we'll fully automate the rotation process.
As the logs are rotated, they are given numeric extensions. The
log currently in service is
access.log. Yesterday's file
access.log.0. The file from three days ago would be
access.log.2, and so on up to the maximum specified in
squid.conf. Squid's own server-information logs
are rotated in the same way. After the logs reach the maximum
squid.conf, the oldest files are deleted by the
rotation. This should help keep the log partition from getting too
The procedures presented in these two articles should be enough to get Squid running on your network. Next, you may want to implement some monitoring and tune Squid to your particular needs. A good place to start is the Squid User's Guide. This document is a little outdated, but provides a nice foundation for understanding Squid and caching in general. The Squid FAQ is also a must-read document. The Squid Mail Archive may also be of interest
If you're interested in seeing side-by-side comparisons of Squid with other cache products, the folks who maintain the Web Polygraph proxy performance benchmark have just completed their latest "bake-off" of cache servers and posted these results.
You may also enjoy reading this detailed review of Squid and its deployment.
I hope that this introductory tutorial has been interesting and useful to you. Of course, Squid has far more capability than has been explored here, and you are encouraged to review the resources linked above for further information. If you choose to implement Squid for your enterprise you should find it to be robust and easy to manage. Good luck.
Jeff Dean is an engineering and IT professional currently writing a Linux certification handbook for O'Reilly and Associates.
System Administrator Michael Alan Dorman responded to Jeff Dean's Squid articles in our Linux forum. Dorman told a cautionary tale about how he got burned when he set up an open Squid cache on an unsecured university system. Dean replied in the forum and has updated his first article with security information.
We are interested in hearing your stories and questions about Squid caches. Share them with us in the O'Reilly Network Linux forum.
Return to the Linux DevCenter.
Copyright © 2009 O'Reilly Media, Inc.