Inside Amazon Web Services
From storage to payment, the king of clouds is dangling an array of low-cost services. We take a close look at the tools for IT and developers.Follow @infoworld
An Amazon Machine Image (AMI) consists of an operating system and whatever applications you want pre-loaded when the virtual machine is started. Currently, only Linux is available as an EC2 instance's OS, though this is hardly a limitation. There are quite a few distributions in Amazon's catalog of prebuilt AMIs. Perusing the list, I found ready-to-use AMIs for Ubuntu, OpenSolaris, Centos, Fedora, and many others – all told, more than 100 AMIs ready to go. You can build your own AMI using a free Amazon-provided SDK, but the process is lengthy. It is far easier to select a prebuilt AMI from the catalog, and customize it as necessary. Even so, many available AMIs include software for specific applications; you may well find one that already has much of what you need.
Simple Storage Service (S3). Amazon's Simple Storage Service (S3) is effectively a large disk drive in the ether. Strictly speaking, that's 90 percent of everything you need to know about it. It has no directories and no file names – just a big place where you can store and fetch unstructured data in gobs as small as 1 byte or as big as 5GB.
What I call a "gob," S3 calls an "object," and in place of "directory," S3 says "bucket." So when you store a 200KB JPEG on S3, you're putting a 200KB object in a bucket. A given AWS account can own up to 100 buckets. A bucket can hold an unlimited number of gobs, and it can be configured to reside either in the United States or Europe. Presumably, this provides users a comforting feeling of locality, because buckets are available anywhere on the Internet that Amazon is accessible. Cost differences between the two are tiny; a bucket in Europe will run you something like two-thousandths of a cent more per 1,000 requests than in the United States.
Digging a bit deeper, you can think of an object as a three-in-one entity: key, value, and metadata. The key is the object's name, value is its content, and metadata is an array of key/value pairs carrying information about the object. (Access permissions are also associated with an object, but are treated as separate from object storage.) An object's name can be between 3 and 255 characters, and the only constraint that Amazon places on names is that they not confuse URL parsing. Thus, an object with a name of "192.168.12.12" is a bad idea.
Whereas the architecture of S3 is effectively a flat file system, S3's APIs permit a clever programmer to build apparent subdirectories within a bucket. The hierarchies have to be encoded in the object names, which is less than ideal; however, it's an artifact that code could simply mask. So, if you want one directory of animals and another of vegetables, you might have object keys such as "animal-cat", "animal-dog," "vegetable-beet," and "vegetable-carrot." Using the prefix parameter of the List operation, you can restrict retrieved object keys to only animals or only vegetables. More complicated data structures should be kept in Amazon's Simple Database.
Amazon Simple Database Service (SimpleDB). While Amazon S3 is designed for large, unstructured blocks of data, SimpleDB is built for complex, structured data. As with the other services, the name says it all. SimpleDB implements a database that sits behind a lightweight, easily mastered query language that nonetheless supports most of the database operations (searching, fetching, inserting, and deleting) you'll likely need. In keeping SimpleDB simple, Amazon has followed the principle that the best APIs are those with minimal entry points: I count seven for SimpleDB.