Cloud versus cloud: A guided tour of Amazon, Google, AppNexus, and GoGrid
Cloud computing offerings differ in depth, breadth, style, and fine print; beneath the heady metaphor lurk familiar pitfalls, complex pricing, and many questions
If you think it's hard to work through the legal rules when a server is in one state and a user is in another, imagine the right answer when your virtual server could migrate within a cloud that might encompass datacenters spread out across the globe. Amazon's terms, for instance, prohibit you from posting content that might be "discriminatory based on race, sex, religion, nationality, disability, sexual orientation, or age." It sounds like Amazon is worried that part of the cloud might touch down in a municipality that forbids things like this.
It almost seems scary to mention this fact, but New York is insisting that Amazon charge sales taxes because Amazon pays a commission to Web sites that do business in the state. What does this mean for applications hosted by Amazon? Do you owe sales tax if your application touches down in a part of the cloud that's in New York? Do you owe income tax?
I wanted to make some allusion to Schrodinger's cat and imply that we can't know where the computation occurs in the cloud, but then I slowly realized that this is far from true. Cloud servers have log files too, and these log files can produce insanely detailed analyses of who might owe which taxes. Major league athletes already hire tax attorneys to compute their share of income earned in each stadium, and some people are suggesting that Web companies aren't paying enough to support the local fire trucks and orphanages. Say good-bye allusions to Joni Mitchell; it's time to start invoking Warren Zevon's "Lawyers, Guns, and Money."
Crashing the cloud metaphor
The legal worries are just part of the details that aren't so certain. One of the biggest dangers is reading too much into the cloud metaphor. While it's largely true that these services are very flexible ways to build up a network of machines, they are far from perfect. What happens if a server or a hard disk crashes in the middle of an operation? Often the same thing that happens when a generic server kicks the bucket: Your data might disappear and then it might not.
An instance of a machine from Amazon's EC2 looks just like a normal machine because after you strip away the hype, it is just another version of Linux running on a chip that probably speaks 8080 machine code and writes data to a spinning platter. If you write something to a good old file in the Unix file system, the cloud metaphor won't protect it. It will stay there until the machine dies. If you shut down the server to save some cash when traffic is low, that's the same thing as dying. That means you can't really scale up and down without a savvy plan for migrating data.
In other words, MySQL in the cloud works just like it does on a generic server. Everything could be lost in a poof unless you start up several instances and mirror them with each other. The magic of the cloud metaphor can't remove this fundamental rule.
If you want something to survive a crash, you've got to put it into the cloud's data stores. These are great services, but they're not cheap. One friend of mine used to back up his disks to Amazon's S3 until he started getting bills for more than $200 a month. He bought a hard disk and kept it on his desk.
The price is higher because the service level is higher. Amazon wants people to be able to trust the data store, and that means providing a level of service that would make a bank happy. Sharing data across servers takes time and careful coding. Google cautions users to be careful writing to its data store because it can be expensive. If you're someone who likes to keep lots of log files just in case, you'll probably pay much more to store them in the cloud than you would in a regular file. Alas, Google doesn't have regular files.
One of the trickier details is trying to understand the prices. GoGrid, for instance, likes to say that its Intel Xeon servers are more powerful than its competitors. Google doesn't even sell server time per se; it just bills you for CPU megacycles, a squirrelly metric. Amazon EC2 has regular-sized machines and bigger ones that are a bit more expensive. When costs change, the companies often lower their prices. But they also raise them when a service turns out to be more expensive to provide than they thought. This complexity will have you scratching your head for a long time because it's hard to know what things will end up costing. That box from Sun may not scale up and down, but the bill isn't going to change with every hit on your Web site.
Best and worst
After working through these systems, I tried to imagine the best and worst applications for these clouds. One of the best fits might be some kind of reservation system for weekend events like concerts. While there might be a small amount of the load at any time, the crunch would come each Friday afternoon when people realize they have no weekend plans. The cloud's ability to spin up more servers to handle this demand would fit this perfectly. The service might also take real reservations and sell tickets in advance, a service that would demand the higher qualities of service offered by the shared data stores.
The worst possible application might be something like RedSoxYankeesTrashTalk.com or any Web site filled with an endless stream of mostly forgettable comments trolling for reactions from the rival fans. While there might be a slight peak around game time, I've found that sites like this keep rolling along even late at night during the off-season. And such a site would certainly attract First Amendment proponents who would look for ways to write a single sentence that could zing all seven of Amazon's protected targets of discrimination.
Furthermore, there would be no reason to pay for high-quality storage because I'm sure that even the participants wouldn't notice if their comments disappeared by mistake. For fun, read Amazon's terms on getting your data back after they shut you down. While I would probably write the same thing if it were my cloud, there are plenty of examples of applications that are better off on their own.
These examples aren't perfect, of course, but neither is cloud computing. After a few weeks of building up some machines and hearing from people who've used the services, I'm pleasantly confused and filled with curious and optimistic questions. Will these clouds be large enough to handle the Internet equivalent of the Thanksgiving weekend traffic jams? Will the cloud teams be able to find a way to offer simple options that are priced correctly for the serious and not-so-serious data wrangler? Will they ever find an adequate meter for computation time?
I suspect the only people who know the answers to these questions today are living in the real clouds where they went after a life ministering to the IBM mainframes. If we could get those guys back here today, we might be able to get this cloud thing up and running smoothly. We just have to convince Intel to build a chip that understands IBM 360 binaries.
Copyright © 2008 IDG Communications, Inc.