Amazon SimpleDB, CouchDB, Google App Engine, and Persevere may have a better way of storing data for your Web app
These terms have become more focused over the years. Google now promises to give you 90 days to get your data off of the servers if it decides to cancel your account -- something it can do "for any reason." Many of the changes I've noticed over that time seem to be focused on DMCA issues, which tie everyone in knots up and down the chain.
It's an interesting question what would happen if you decide to leave Google or Google asks you to leave. Google distributes a nice development tool that makes it easy for you to test your applications on your local machine. There's no technical reason why you couldn't host your service on your own server with these tools, except you would lose some of the cloud-like features. The data store included for testing wouldn't replicate itself automatically, but it seems to do everything else on my local machine. As always, there are some legal questions because "license is for the sole purpose of enabling you to use and enjoy the benefit of the Service."
There's no reason why you need to work with a cloud to enjoy these new services. CouchDB is one of the many open source projects that build a simple database for storing key-value pairs. The project, written in Erlang, is supported under the aegis of the Apache Software Foundation. You can install it on any server by downloading the source files and compiling them. Then there are no more charges except paying for the server.
There are still some limitations to this project. While the front page of the project calls it "a distributed, fault-tolerant, and schema-free document-oriented database," you won't get the distribution and fault-tolerance without some manual intervention. The nice AJAX interface to CouchDB includes a form that you can fill out to replicate the database. It's not automatic yet.
This approach isn't as limiting as it might seem. As I was working with these databases, I soon began to see how anyone could layer on a security model at the client with the judicious use of some encryption. Empowering the client reduces the need for much security work at the server, something I wrote about in Translucent Databases.
Persevere is not like some of the other databases in this space that seem proud of the fact that they're "schema-free." It lets you add as much schema as you want to bring structure to your pairs. Instead of calling the top level of the hierarchy a domain (SimpleDB) or a document (CouchDB), Persevere calls them objects and even lets you create subclasses of the objects. If you want to enforce rules, you can insist that certain fields be filled with certain types, but there's no recommendation. The schema rules are optional.
The roots in the Dojo team are apparent because Dojo comes with a class, JsonRestStore, that can connect with Persevere and a number of other databases. including CouchDB. (Dojo 1.2 will also connect with Amazon's S3 but not SimpleDB and Google's Feed and Search APIs but not App Engine, at least out of the box.) The "Store" is sophisticated and has some surprising facilities. When I was playing with it initially, I hadn't given the clients the permissions to store data directly. The tool stored the data locally as if I were offline and had no connection with the database. When I granted the correct permissions later, the changes streamed through as if I had reconnected.
I encountered a number of small glitches that were probably due more to my lack of experience than to underlying bugs. Some things just started working correctly when I figured out exactly what to do. It's not so much Persevere itself you need to master, but the AJAX frameworks you're using in front of it. The documentation from Dojo is better than most AJAX frameworks, but it will take some time for Dojo to catch up with the underlying complexity that's hidden by the smooth surface of Persevere.
Cloud or cluster
After playing with these databases, I can understand why some people will keep using the word "toys" to describe them. They do very little, and their newness limits your options. There were a number of times when I realized that a fairly standard feature from the SQL world would make life simpler. Many of the standard SQL-based tools, like the reporting engines, can't connect with these oddities. There are a great many things that can be done with MySQL or Oracle out of the box.
But that doesn't mean that I'm not thinking of using them for one of my upcoming projects. They are solid data stores and so tightly integrated with AJAX that they make development very easy. Most Web sites don't need all of the functions of a MySQL or Oracle, and JOIN-free schemas are still pretty useful for many common data structures, including one-to-many and one-to-one relationships. Even many-to-one relationships are feasible until something needs to be changed. Given that database administrators are often denormalizing the tables to speed them up, you might say that these non-relational tools just save them a step.
One of the trickier questions is whether to use a cloud or build your own cluster of machines. Both Google and Amazon offer multimachine promises that CouchDB and Persevere can't match. You've got to push the buttons yourself with CouchDB. The Persevere team talks about scaling in the future. But it can be hard to guess how good the promises of Amazon and Google might be. What happens if Amazon or Google loses a disk? What if they lose a rack? They still don't make explicit promises and their terms of service explicitly disclaim any real responsibility.
Amazon's terms, for instance, repeat this sentiment a number of times: "We are not responsible for any unauthorized access to, alteration of, or the deletion, destruction, damage, loss or failure to store any of, Your Content (as defined in Section 10.2), your Applications, or other data which you submit or use in connection with your account or the Services."
I can't say I blame Amazon or Google because who knows who is ultimately responsible for a lost transaction? It could be any programmer in the stack, and it would be practically impossible to decide who trashed something. But it would be nice to have more information. Is the data in a SimpleDB stored in a RAID disk? Is a copy kept in another geographic area unlikely to be hit by the same earthquake, hurricane, or wildfire? The online backup community is starting to offer these kinds of details, but the clouds have not been so forthcoming.
All of these considerations make it clear to me that these are still toy databases that are best suited for applications that can survive a total loss of data. They're noble experiments that do a good job of making the limitations of scale apparent to programmers by forcing them to work with a data model that does a better job of matching the hardware. They are fun, fast, and so reasonable in price that you can forget about writing big checks and concentrate on figuring out how to work around the lack of JOINs.
Overall Score (100%)
|Google App Engine||8.0||9.0||8.0||8.0||8.0|
This weekend's Windows 10 upgrade has users angry, and it's unclear if the ploy will continue
Here’s the best of the best for Windows 10. Sometimes good things come in free packages
Speaking at the O'Reilly Fluent conference, Eich also endorsed the Service Workers mobile app...
Making the switch to Salesforce’s ecosystem can prove lucrative for biz-savvy programmers
A chat with IT consultant and former InfoWorlder Bob Lewis leads to a couple of entertaining...
Secure messaging apps are all the rage, and developers can now develop their own open source Wire...