Today I attended two more presentations that focused on Amazon’s EC2 and S3 services. I also spoke with the Amazon Web services team at length yesterday to find out more about their offering and how I might be able to use it. After asking many more questions and sleeping on their offerings I have a few more thoughts to share about how to use these promising services.

(Before you continue reading, I suggest that you read my previous post on Amazon’s Web Service offerings )

My first concern focused on reliability of the Elastic Computing Cloud (EC2) service. Not so much about the overall reliability, but the reliability of any of the instances (server) in the cloud. Everyone knows that computers will fail at the most inopportune moments and EC2 would be no different.

To this effect you need to think of any instance in the EC2 cloud as being able to fail at any given moment. And when it fails, all the data on the hard drive of that instance is gone. Thus, if you have data on any instance, you need to write it back to S3 in order to protect yourself from data loss.

Along the same lines, Amazon has no provision for fail-over. If a server fails you may not know right away and worse, it may take some time (minutes possibily) to bring up another instance to pick up where the crashed instance left off. Rightscale, a company that provides value add services for Amazon’s EC2 offering, plans to add this feature into their offering in the future. But I suspect that a fail-over solution will likely be based on some sort of hot-spare concept, which will increase your overall cost. As usual, redundancy is going to cost you.

The next thought concerns bandwidth costs — Don McAskill talked about the cost of serving content from S3 at length in his presentation “Set Amazon’s Servers on Fire, Not Your Own“. If you’re currently purchasing a lot of bandwidth from a provider you may already be getting a good deal. Less so if you purchase only small amounts. SmugMug buys bandwidth in gigabit blocks and at those economies of scale, bandwidth is cheap. Amazon S3’s $.20/GB transfer cost can’t compete with buying bandwidth by the gigabit.

SmugMug’s solved this problem by building a tiered cache system where the most requested images reside on their local servers (where the bandwdith is cheap) and less frequently requested images are served via the S3 service. While it takes more work to deploy such a system it presents a model that is cheaper to operate than using only local servers or only S3 servers.

Don presented the numbers of how much money SmugMug saved last year by using S3. For the given phase of time Don calculated SmugMug grew from 64M images to 140M images. Normally that would cost between $40K - $100K a month in hard drives alone. He estimates that in that time about $922K would’ve been spent, but only $230K were actually spent. That saved SmugMug $692K during that period!

Not only that, but he says that the IRS would’ve also taxed SmugMug to the tune of $295K related to the purchase of the hard drives. While he says this is not a straight out savings, it did free up his cashflow for other more important things. Overall, these numbers are far from insignificant — that a small organization like SmugMug can save more than half a million dollars is quite significant. That of course begs the question how Amazon can do this so cheaply and still make ends meet?

Also, Amazon does not offer a load balancing service that people can use to direct traffic to the instances that are running a service. Right now Amazon customers who need a load balancer, need to create their own server image to carry out that task and run it on a separate instance.

Finally, each of the instances have a modest hardware configuration: An equivalent to a 1.7Ghz Xeon CPU, 1.75GB of RAM and 160GB of local disk. 1.75GB of RAM isn’t enough to run a serious database server. Database servers are RAM hungry — the more the better. I hope that Amazon will offer more RAM as an option for a premium instance.

So, while Amazon’s offerings are simple and inexpensive, care needs to go into designing solutions that use their offerings. But as SmugMug and a number of other beta testers have shown, it is possible to live with the limitations and engineer a solution accordingly.