NoSQL databases break all the old rules
Amazon SimpleDB, CouchDB, Google App Engine, and Persevere may have a better way of storing data for your Web appFollow @peterwayner
SimpleDBcomes with two levels of hierarchy on top of the piles of data pairs. The top level is the "domain" and the second level is the "item." After you choose the domain and item names, you pour in the pairs. SimpleDB's comparatively feature-rich API includes the ability to sort the data and even count the number of items that match the query. You can even write queries that exclude values that don't start with a certain string. This may not sound like much to someone who uses SQL Server or Oracle, but some of these low-rent databases can't even sort the data in the result set.
SimpleDBis meant to be used with Amazon's Simple Storage Service (S3), because each of the values in the pairs is limited to 1,024 bytes. That's enough for many strings, but it's not enough for many content engines. So you store a pointer to the data in S3. There are a few libraries like an extension of the Java Persistence Architecture that straddle the two clouds and handle this pointer juggling for you.
There are other limitations that can lead you to start doing JOIN-like things with multiple calls. Each query can only run 5 seconds. The answer can only hold 250 items. Each item can have only 250 pairs. Some people half joke about concatenating multiple values with keys like "description1," "description2," and "description3." There are many simple work-arounds for the limitations, but they start to make you wonder whether SimpleDB is supposed to make your life easier or harder.
Amazon is beginning to rewrite the APIs to push for more and better authentication. Come September 2009, calls to the SimpleDB (and a few other services) will run through SSL, providing both security and authentication. Amazon is also enhancing the signature mechanism to use more sophisticated hashing algorithms that pack together more of the request. This is just one of the ways that Amazon is slowly rolling out small improvements.
The company is also creating more libraries that make it simpler to use the service. There are dozens of packages that work with all of the major languages and some of the minor ones. The documentation is extensive. It's usually possible to start up and begin storing your data in little time.
The price is now easier to handle. There's a "free tier" of service that lets you burn up to 25 hours of computation time per month -- enough, Amazon estimates, to run a basic logging tool that processes less than 2 million requests a month. Plus, Amazon recently slashed the price for storage from $1.50 to 25 cents per gigabyte. The company appears committed to keeping the charges transparent so users will have the right incentives to structure their consumption.
Amazon has one of the more advanced terms of service. There are plenty of clauses that work through some of the problems you might encounter, and several caught my untrained eye. For instance, Amazon claims, "We may delete, without liability of any kind, any of your Amazon SimpleDB Content that has not been accessed in the previous 6 months." This may be perfectly acceptable for the people who are taking the system out for a spin with test data and not paying for it, but the phrasing suggests a bit of the omnipotence that Amazon probably feels it needs to keep its datacenter running.