Network Forensics: Tapping the Internetby Simson Garfinkel, author of Web Security, Privacy & Commerce, 2nd Edition
During the Gulf War, computer hackers in Europe broke into a UNIX computer aboard a warship in the Persian Gulf. The hackers thought they were being tremendously clever -- and they were -- but they were also being watched.
Just before penetrating the PACFLEETCOM computer and reading the Navy's email, the hackers hopped through a computer at Los Alamos Laboratory. And unknown to the attackers, every packet in or out of Los Alamos over the Laboratory's Internet connection was recorded and preserved for later analysis on magnetic tape.
The incident in the Persian Gulf became a cause celebre in the years that followed. Tsutomu Shimomura bragged about the incident in his book Takedown. Many experts in the field of computer security used the story as proof, of sorts, that the U.S. military was asleep at the switch when it came to computer security.
One of the more dramatic outcomes of the incident was a videotape played at the annual meeting of the American Association for the Advancement of Science in February 1993 -- a video that showed each of the attacker's keystrokes, replete with mistakes, and the results, as he systematically penetrated the defenses of the ship's computer and scavenged the system.
I was one of the journalists in the audience watching the 1993 video. At first, I was incredulous -- how could a national laboratory possibly record every bit of data moving back and forth over the Internet connection? But then I did the math.
In 1991 the lab most likely had a T1 link, which transmits at most 1.544 million bits each second. Multiplying that by 60 seconds per minute, 60 minutes per hour, and 24 hours a day, comes to 133 gigabits, or a little less than 17 gigabytes. In 1993, that was a lot of data, but not an impossibly large amount. Certainly a big national lab like Los Alamos could hire somebody to load a new DAT tape every two hours, if that was what was required to archive all the information.
In the decade that followed the Gulf War, Moore's law had its way not only with processors, but with bandwidth and storage as well -- but each unequally. While the clock on the average workstation surged from 25 Mhz to 1.1 Ghz, and while the typical "big" hard drive jumped from a few hundred megabytes to 160 GB, bandwidth increased at a comparatively modest rate -- from 28.8 kbps to 384 kbps for many homes and small businesses. Even today, few businesses have more than a T1's worth of Internet bandwidth.
These trends are accelerating. For the foreseeable future, both the amount of information that we can store and our ability to process that information will far outpace the rate at which we can transmit information over large distances. As a result, where it once took the prowess of a national laboratory to systematically monitor all of the information sent over its external Internet connection, now this capability is available to all.
Today some organizations are following Los Alamos's precedent and routinely recording some or all of the traffic on their external Internet connections. Little of this information is actually analyzed. Instead, it is collected in expectation that it might be useful at some future point. After all, if you want to be able to review the information moving over your Internet connection at some point in the future, you must record it now -- fast as they are, today's processors still can't travel back through time.
Capturing everything moving over the network is simple in theory, but relatively complex in practice. I call this the "catch it as you can" approach. It's embodied in the open source programs tcpdump and windump, as well as in several commercial systems like NIKSUN's NetVCR and NetIntercept, which my company, Sandstorm Enterprises, recently brought to market.
Another approach to monitoring is to examine all of the traffic that moves over the network, but only record information deemed worthy of further analysis. The primary advantage of this approach is that computers can monitor far more information than they can archive -- memory is faster than disk. So instead of being forced to monitor the relatively small amount of network traffic at the boundary between the internal network and the external network, you can actively monitor a busy LAN or backbone.
A second advantage of this approach is privacy -- captured traffic almost invariably contains highly confidential, personal, and otherwise sensitive information: if this data is never written to a computer's disk, the chances of it being inappropriately disclosed are greatly reduced.
In some circumstances, it may not even be legal to record information unless there is a compelling reason or court order. Call this the "stop, look, and listen" approach. This approach, pioneered by Marcus Ranum in the early 1990s, is now the basis of Ranum's Network Flight Recorder (NFR) as well as Raytheon's SilentRunner, the open source snort intrusion detection system, NetWitness by Forensics Explorers, and even the FBI's "Carnivore" Internet wiretapping system (since renamed DCS 1000).
Recently, Information Security magazine coined the term Network Forensic Analysis Tool (NFAT) to describe this entire product category. (Ranum coined the term "Network Forensics" back in 1997.)
With the heightened interest in computer security these days, many organizations have started to purchase monitoring appliances or have set up their own monitoring systems, using either commercial or open source software. If you are charged with setting up such a project, or if you are just curious about the technical, ethical, and legal challenges these systems can cause, read on.
Build a Monitoring Workstation
In many ways, a system that you would use for monitoring a computer network looks a lot like any other high-end Windows or UNIX workstation. Most run on a standard Intel-based PC and capture packets with an Ethernet interface running in promiscuous mode.
"Catch it as you can" systems immediately write the packets to a disk file, buffering in memory as necessary, and perform analysis in batches. As a result, these systems need exceptionally large disks -- ideally RAID systems. "Stop, look and listen" systems analyze the packets in memory, perform rudimentary data analysis and reduction, and write selected results to disk or to a database over the network. Of course, no matter which capture methodology is employed, the disks eventually fill up, so all of these systems have rules for erasing old data to make room for new data.
How much attention you need to give the hardware you use for network monitoring depends to a large extent on the complexity of your network, the amount of data at the points you wish to monitor, and how good a job you want to do. If you are trying to capture packets as they travel over a 384kbps DSL link, a 66Mhz 486 computer will do just fine. If you are trying to make extended recordings of every packet that goes over a fully-loaded gigabit link, you will find it quite a challenge to build a suitable capture platform and disk farm.
To explore the differences between different operating systems and hardware platforms, Sandstorm Enterprises purchased two identically-configured Pentium III-based dual-processor systems with removable disk drives. One system was set up as a packet generator using a program that transmitted individually serialized Ethernet packets of varying sizes. The second system was set up with rudimentary capture software -- either tcpdump on the UNIX systems, or windump for Windows.
We then wrote an analysis package that examined the recorded dump files and calculated both the percentage of dropped packets and the longest run of dropped packets under varying network load. By holding the processor, bus, and Ethernet cards constant and loading different operating systems onto different hard disks, we were able to determine effects of different operating systems on overall capture efficiency. Once we found the best operating system, we were able to swap around Ethernet adapters and disable the second CPU to determine the effects of different hardware configurations.
The results of our testing were more reassuring than surprising. Over the six operating systems tested, FreeBSD had the best capture performance and Windows NT had the worst. Under FreeBSD, we found that Intel's EtherExpress cards had the best packet capture performance. Finally, we found that FreeBSD did a somewhat better job capturing packets when run with a single processor than when run with two processors, although if additional analysis work was being done at the same time on the same computer, having two processors was vastly preferable. The reason for this is that no process can dominate both processors at the same time, and thus one processor ends up doing packet capture, and the other processor ends up doing analysis.
Sandstorm used the results of this testing to choose the hardware configuration for its NetIntercept appliance, although the results are applicable to any organization setting up a monitoring system. Of course, for many installations the choice of hardware and software will largely be determined by available equipment, training, and the supported hardware or software of the monitoring software to be used.
For example, organizations with significant Linux experience will almost certainly prefer using Linux-based systems for their packet capture systems, rather than acquiring experience with FreeBSD. And unless you are on a heavily loaded 100BaseT network, the overall packet capture differences between FreeBSD and Linux are probably irrelevant.
(Note: some vendors have developed specialty hardware for directly tapping T1, OC3, ATM, and other kinds of wide-area network connections. If this sort of surveillance work is what you need to do, such equipment can be tremendously useful. In many applications, however, it's possible to get around the need to tap these physical layers by setting up monitoring or mirror ports on a managed switch.)
If you intend to record most or all of the traffic moving over your network, you need to spend as much time thinking about your disk subsystem as your processor and Ethernet card. Last year Sandstorm spent several months comparing IDE drives with the UDMA100 interface to SCSI LVD-160 drives. We also explored a variety of RAID systems. The conclusion: today's IDE drives are significantly faster than SCSI drives costing two or three times more per gigabyte stored.
This is not the result we were expecting, and it goes directly against the conventional wisdom that says SCSI is inherently better than IDE. Nevertheless, it does seem to be the ugly truth, at least for straightforward read/write tests in a single-user environment. Although we saw the highest performance with a hardware-based RAID 5 system manufactured by Advanced Computer & Network Corporation, we saw nearly the same performance with a RAID 5 system based on the 3Ware Escalade 7000 RAID controller.
Long-term storage of captured data is another problem entirely. Although you can build a terabyte RAID system for less than $2,000, backing this system up will set you back $4,000 for the AIT II tape drive and $120 for each 100GB cartridge. Absent extraordinary requirements, most users will elect not to back up their capture disks, and instead archive specific capture runs to CD-R or DVD-RAM drives.
Pages: 1, 2