My little webhosting company is growing faster than expected, and I’m still using the same approach to hardware as I was in 2000 when I started it. Started asking for advice from experienced sysadmins, and here are some notes from my first conversation with a guy that ran a huge webhosting center for a few years.
(Note: I’m not going to clean up these notes too much. Mostly just posting here for my own reference, and maybe it’s helpful to someone else out there, as-is.)
Where you run into problems:
Not so much the storage. It’s the TCP/IP stack.
Building up and tearing down connections for 400 domains.
Lots of contact-switching and overhead. Even adding more RAM won’t help.
Huge companies (Yahoo) end up moving to hardware-based TCP/IP off of kernel into hardware. Redbank is one vendor of that kinda thing. But those are extremely expensive.
I would say SATA drives are the way to go.
Instead of slapping them into one box & doing RAID there, look into doing ISCSI - SCSI-over-ethernet. Like Fibre Channel but that’s too expensive. ISCSI is the answer to that : doing the same as Fibre Channel with a little less performance.
Files dished-up by webserver are read-only. When writing, don’t need performance,
Commerically, look at VMWARE. Virtualizes your box, run on
Open-source version of VMWARE is XEN.
Boxes with 2-4 CPU and 32G of RAM : 16-20 instances of FreeBSD/Linux on it.
The advantage is the TCP/IP stack.
Also lets you virtualize your storage.
If you were to do something like ISCSI : not a fileserver. You can add things like new disks without rebooting FreeBSD.
Myself, I’ve only used VMWARE the commercial version. ($6000 or something for 2CPU version)
Example: if you have a 2CPU box and you update to a 4CPU box, you won’t see that much improvement, since it comes down to the TCP/IP stack.
Don’t think centralized file serving, think centralized disk serving.
If you have 12 SATA disks in a RAID. You can chop that up into many. /dev/wda0 /dev/wda1 /dev/wda2 into 500 megs each.
One client says they want to use all 2 gigs for his website, so we publish another virtual disk that’s 2 gigs, map it to web-1, carve it up how you like. Now you can easily publish up more pieces of our array.
There are performance reasons to not just have those SATA disks be a big partition. I can take my SATA array and stripe it : RAID cards in server.
Do something totally external : a different device sitting in the rack with a bunch of disks.
We may end up with 12 disks sitting in a server. But a server right next to it has more disk space that we wish we could use.
Coming up with centralized disk space. Biggest bang-per-buck for flexibility.
Dell Blade server : 8 blades : each blade had 8 VMware instannces. This one blade box had 32 servers.
VMWARE - makes it easier to publishing disks to different instances.
One of the things you’d do with VMWARE/XEN : you make your vanilla build : a snapshot of your O.S. install, You copy that file and call it FreeBSD-gold1 or whatever, change a couple things in the startup file, clone it.
Boxes that do external storage : I haven’t myself done something with ISCSI : “Left Hand Network”.
SAN didn’t exist when he was doing this. SAN exists in two forms (1) fibre channel (2) iSCSI.
First place to look is to centralize your disks.