NoSQL databases break all the old rules
Amazon SimpleDB, CouchDB, Google App Engine, and Persevere may have a better way of storing data for your Web appFollow @peterwayner
But that doesn't mean that I'm not thinking of using them for one of my upcoming projects. They are solid data stores and so tightly integrated with AJAX that they make development very easy. Most Web sites don't need all of the functions of a MySQL or Oracle, and JOIN-free schemas are still pretty useful for many common data structures, including one-to-many and one-to-one relationships. Even many-to-one relationships are feasible until something needs to be changed. Given that database administrators are often denormalizing the tables to speed them up, you might say that these non-relational tools just save them a step.
One of the trickier questions is whether to use a cloud or build your own cluster of machines. Both Google and Amazon offer multimachine promises that CouchDB and Persevere can't match. You've got to push the buttons yourself with CouchDB. The Persevere team talks about scaling in the future. But it can be hard to guess how good the promises of Amazon and Google might be. What happens if Amazon or Google loses a disk? What if they lose a rack? They still don't make explicit promises and their terms of service explicitly disclaim any real responsibility.
Amazon's terms, for instance, repeat this sentiment a number of times: "We are not responsible for any unauthorized access to, alteration of, or the deletion, destruction, damage, loss or failure to store any of, Your Content (as defined in Section 10.2), your Applications, or other data which you submit or use in connection with your account or the Services."
I can't say I blame Amazon or Google because who knows who is ultimately responsible for a lost transaction? It could be any programmer in the stack, and it would be practically impossible to decide who trashed something. But it would be nice to have more information. Is the data in a SimpleDB stored in a RAID disk? Is a copy kept in another geographic area unlikely to be hit by the same earthquake, hurricane, or wildfire? The online backup community is starting to offer these kinds of details, but the clouds have not been so forthcoming.
All of these considerations make it clear to me that these are still toy databases that are best suited for applications that can survive a total loss of data. They're noble experiments that do a good job of making the limitations of scale apparent to programmers by forcing them to work with a data model that does a better job of matching the hardware. They are fun, fast, and so reasonable in price that you can forget about writing big checks and concentrate on figuring out how to work around the lack of JOINs.
Read more about data management in InfoWorld's Data Management Channel.