10 things never to do with a relational database

The data explosion demands new solutions, yet the hoary old RDBMS still rules. Here's where you really shouldn't use it

Page 2 of 2

5. Users/groups and ACLs: To some degree, LDAP was the original NoSQL database. LDAP was designed for users, groups, and ACLs, and it can fit the problem like a glove. Sadly, many people have LDAP hangovers from when the technology was newer, and companies did terrible and monstrous things with it. Some companies have also built such a bureaucracy around it that many developers had no choice but to cheat by creating a database table. This defeats the purpose of centralized user access control. The "users" and "roles" tables should go away in any enterprise environment.

6. Log analysis: If you need a good demonstration of this, turn on the log analysis features of Hadoop or RHQ/JBossON for a small cluster of servers. Set the log level and log capture to anything other than ERROR. Do something more complex and life will be very bad. See, this kind of somewhat unstructured data analysis is exactly what MapReduce à la Hadoop and languages like PIG are for. It's unfortunate that the major monitoring tools are RDBMS-specific -- they really don't need transactions, and low latency is job No. 1.

7. Media repository: It may be OK to store your metadata (though it probably would be better in a document database like Couchbase 2.0 or MongoDB), but BLOBs in an RDBMS are still a pain after all of these years. You're better off using some kind of distributed storage or clustered file system for your images and other binaries. Sadly, many CMS engines still shove everything into an RDBMS.

8. Email: I know this firsthand. After running a project that attempted to integrate email and an RDBMS, I discovered what many others already knew: Email really is moderately unstructured data with metadata that is best stored another way. We optimized the RDBMS as much as possible, doing crazy things for BLOBs and more. Ultimately, email is about metadata, search, and content, none of which lend themselves to relational algebra, and you really don't need transactions here. The file system is fine, and metadata would be better off in document database.

9. Classified ads: A high-scale, lots of users in and out, with mostly short-and-sweet content. Ask Craigslist who uses the document database MongoDB. There's search, there's metadata, there's short-sweet content. Eventual consistency would be good enough here. For these kinds of documents, the best thing a database can do is get out of your way.

10. Time-series/forecasting: This is the most general of the 10, but it takes many forms, from commodities to quants and sunspots to weather. The issues surrounding time in relational databases are the stuff of legend. Sure, it has been done, and sure, after years of hacking around it, for the last decade or so we have temporal fields and functions that are merely deficient rather than woefully inadequate in most RDBMS implementations. That said, if time is your subject, then a MapReduce-friendly column family store like Cassandra may be a better solution. Datastax has specifically targeted its Cassandra distribution to support time-series data, as have other vendors.

Can you use the RDBMS for some or many of these? Sure -- I have and people continue to. However, is it a good fit? Not really. I expect the cranky old men to disagree, but tradition alone is not a good reason to stick with the old way of doing things.

This article, "10 things never to do with a relational database," was originally published at InfoWorld.com. Read more of Andrew C. Oliver's Strategic Developer blog, and keep up on the latest developments in application development at InfoWorld.com For the latest business technology news, follow InfoWorld.com on Twitter.

| 1 2 Page 2