Recently, I had the pleasure of watching an article of mine show up on Slashdot, Digg.com, Newsforge and a myriad of other web sites at the same time. None of my colleagues had seen so much traffic on a Linux site before. Aside from the several million hits on our server, we had a quarter of a million unique visitors concentrated in a five-hour period.
When you see that kind of traffic, you don’t want the server to go down or you’ll miss new readers. In our situation, a reboot allowed the system to return to service for a few minutes, but then it locked up again. On a normal day, we used less than ten percent of our system resources, so we thought we had prepared for the hottest day of the year. Little did we know we would experience rolling blackouts.
I attempted to explain to our self-proclaimed system administrator that we needed a way to renice unnecessary processes automatically, restart those that failed and keep the server going at all costs. He just had no enterprise experience and insisted he knew the best way to manage the system. He used a RRD tool with a front-end. But when it came to handling processes, he insisted he would login with SSH and handled everything remotely.
This provides an example of hardheadedness and an inability to adapt and improvise. I know you do not know any one else like this, but I do. In fact, the majority of noise makers in the community refuse to use tools that execute in a web services environment.
You can characterize the modern business climate by friction, uncertainty, disorder, and rapid change. Each situation is a unique combination of shifting factors that cannot be controlled with precision or certainty. System administrators need to learn to adapt. Adaptability means shortening the time it takes to adjust to each new situation.
Two basic ways to adapt exist. First, if we have enough awareness to understand a situation in advance we can take preparatory action. Call this anticipation.
At other times we have to adapt to the situation on the spur of the moment without time for preparation. This is improvisation. To be fully adaptable, we must be able to do both.
We co-hosted our server at an ISP 250 miles from our base of operations. When the server went down, we had to call the ISP and have one of his service personnel run down to the server rack and power it back to a working situation.
From that point, our brilliant, know-it-all sys admin had to login into sshd. By the time the server powered back and started all the processes including sshd, the server locked up again. So what was next?
Insisting on using the command line, the sys admin had to walk a support technician through a series of steps to renice and shutdown PIDs without allowing the system to connect to the Internet.
Once he got the system dedicated to one purpose, he reconnected it to the Internet and though slow, the OS and web server stayed up.
Next, our sys admin had to stop several of the Content Management System processes to keep the server up and allow visitors to view a single article. Eventually, the server responded in a reasonable manner. But, consider all the trouble.
I prefer using the command line myself. But some situations warrant something in addition. For example, when you manage dozens or even hundreds of servers, you need to poll and drill down into each server remotely and efficently.
One of the tools I like allows me to do the drilling. The developers call it Monit. Here’s the description from the web site:
Monit can start a process if it does not run, restart a process if it does not respond and stop a process if it uses to much resources. You can use monit to monitor files, directories and devices for changes, such as timestamp changes, checksum changes or size changes. You can also monitor remote hosts; monit can ping a remote host and can check TCP/IP port connections and server protocols. Monit is controlled via an easy to use control file based on a free-format, token-oriented syntax. Monit logs to syslog or to its own log file and notifies you about error conditions and recovery status via customizable alert.
Monit allows you to view a server’s status through a web browser among other features. You can find the project at http://www.tildeslash.com/monit/. Additionally, Falko Timme has written a howto for installing and setting up Monit at http://howtoforge.com/server_monitoring_monit_munin.
Command line? Essential for any system administrator to know. But, consider adding adaptability to your skill set. It will pay off.