Now that I’ve gotten the introduction out of the way, I can get to the meat of why I’m blogging here in the first place. This is a long one, but I think you’ll find it’s worth it.
I’m not writing Puppet because I think I’m right or whatever; I’m writing it out of desperation, because no one else is even trying. Not only are people not trying to make better tools than we have available today, they’re not even using the crappy ones we do have available, which is just sad. Imagine if the computing world had just refused to write any code until C (or better yet, Ruby) showed up; where would we be now?
I’m less interested in why sysadmins don’t use the existing tools, though, and more interested in why they don’t publish their own. Most of the rest of the technical world seems to have figured out how to solve their problems using code and then how to turn that code into a self-sustaining project, either open-source or commercial. Sure, this stuff was complicated fifteen years ago, but it’s pretty straightforward now; and yet, instant messaging tools have larger development communities than sysadmin tools, and the majority of sysadmins spend their days toiling with a bunch of little one-off scripts that no one else will ever see or use and that the next sysadmin will gratefully /dev/null as soon as possible.
I’ve heard all of the standard excuses — we don’t have enough time, we can’t risk it, we spend all day doing computers and don’t want to do it at night, my company won’t let me, etc. Every software project that has ever evolved out of an internal project has exactly these same excuses, and yet they have somehow succeeded. Why have so few sysadmin tools evolved this way? Why are sysadmins so willing to believe their excuses?
Even then, okay, you can’t publish a tool in your current environment. Where are all of the sysadmin entrepreneurs? Anyone can tell you this space is miserable, the pain is chin-deep, the tools are horrible, there are no standards or best practices to speak of, and almost everyone is willing to spend tons of cash to make things even a little bit better. Why are there no other open source management tools with companies backing them? It’s pretty damn easy to create a product and make a decent living consulting off it; why do so few people try it?
I submitted a panel to OSCON on exactly this topic, but I don’t expect them to accept it because I don’t have a list of experts to sit with me. I’m pretty sure I could find a few people to talk about it in front of an audience, if O’Reilly will give me the chance, but who are the luminaries in the sysadmin field? If I knew who they were, I’d invite them, but, um, we’re short on luminaries.
Kathy Sierra recently posted an article about how you should want employees who are passionate about what they do rather than the company they work for (incidentally, I completely agree, and I have the pink slips to prove it); one question in her test of how passionate you are about your field is whether you can list some of the key people in your field (Kathy specifies two, but my wife’s a scientist and I’d be shocked if she couldn’t name at least 10 of the key people in her field).
Unfortunately, I’m afraid that might mean that most sysadmins are technically not passionate about their work, because the field is so disorganized that there aren’t a couple of key people to be listed. I expect the best that most people could do is name book authors like AEleen Frisch or Tom Limoncelli; I think books are important, but neither of these books comes even close to tackling the complexity of what real system administration looks like. Heck, Tom did a short presentatation at the Configuration Management Workshop in 2005 explaining how he has successfully avoided any real automation throughout his career. He’s got the best sysadmin book out there today, but he doesn’t use any automation. I like the guy a lot, but that pretty much disqualifies him as a key person in what I think of as my profession, and I doubt that most sysadmins would say that his book has the solutions to the real problems they face today.
Imagine the best books on development only covering how to write code, and skipping over all the hard bits, like design or testing. Now imagine there only being about four books on development. Ouch. Welcome to my world. I’ve said it many times: The state of system administration is pitiful. The tools are horrible, there’s no community to speak of, best practice is essentially non-existent, there are basically no tools with open and active development communities, there are almost no startups trying to solve these problems, and no one really seems to care. Heck, there aren’t really even any sysadmin bloggers; we’ve got 234,486 people blathering on about Web 2.0, and they’re all depending on one guy with SSH and a for loop to build and maintain their network because no one’s even talking about system administration. Technorati shows 14,573 results for ’sysadmin’ (many of which seem to be military rather than technical), vs. 184,207 for web2.0.
I’m really hoping Puppet and the tools I’m developing around it can make some significant strides on the technical front, and I’m doing everything I can to make strides elsewhere (all of my talks on Puppet spend time on these problems, and a lot of my information conversations focus on these issues rather than Puppet itself). I need help, though.
I need sysadmins to actually try out those better tools. No, I’m not that fond of cfengine any more, but it’s ten times better than that tool you wrote for yourself that no one else will ever see, if only because there’s a community of people who have the same problems you’ll have. Sure, I’m biased because I’m the author, but I think you’ll find Puppet even better than cfengine. And if you don’t, try Bcfg2, or write your own and publish it. Next start contributing bug reports and patches and documentation to whatever tool you’re using. Most of all, I need you to understand that we have to develop a community in order to push the field forward.
I need people who consume sysadmin services to stop accepting their excuses. Yes, they actually can get a server built for you in less than an hour; the last production server I built took eight minutes. Of course, they can’t do that manually, but who builds servers manually? (If your sysadmin is using CDs to build servers, s/he should be fired. No ifs, ands or buts. That hasn’t been the state of the art since before CDs were invented.) Yes, they can build ten of them just as easily as they can build one, as long as they’re using decent tools. Yes, they can audit all the systems real quick, just to make sure things look good. Yes, they can deploy another data center with copies of all the servers. Yes, they can upgrade that application everywhere in just a few minutes.
Of course, they can’t do any of these if they’re not using good tools, but developers couldn’t do anything they accomplish if they didn’t have good tools. The longer you let your sysadmin make excuses about how s/he is too busy ssh’ing to machines to learn how to use automation, the more money it costs you, in both man-hours and service quality.
Unfortunately, we also need companies to empower those sysadmins. Every tool I’ve ever installed for a company has required painful buy-in fights. Fortunately, Darwin can help solve this problem for us: Use good system tools and you’re a more competitive company; use no tools, and you’re filing for bankruptcy. Covad is the only nationwide DSL company to survive the first bubble (that’s right, first bubble; I’m talking to you, all you Web 2.0 types), and it’s because they invested tons of money in an automated infrastructure. They invested while whey were flush, and when the times got lean they had great service with low overhead.
Google’s famous for their search algorithm, but they’re almost more famous for their infrastructure. If Google had a better search algorithm but had stupidly decided to maintain all their machines manually, or using mediocre tools, there’s no way they’d be where they are today. Their sites would be slow, they couldn’t deploy new services, and their overhead would be so high they’d have to charge a lot more just to break even. Google realizes that an automated system infrastructure is critical to their competitiveness (which is why they never open source code from their infrastructure, only from their applications). Is your company so special that you’re immune from problems that Google spends so much money and time (and secrecy) on?