How to screw up your MongoDB schema design

Mongo has recently become a thing. Fortunately, some RDBMS skills port over easily -- but schema design isn't one of them

Tonight I'm going to a sold-out MongoDB event hosted by my company at our local startup incubator. Already, 250 have people signed up. That's a pretty good crowd for Durham, NC, but not surprising considering the focus is MongoDB schema design.

Schema design with MongoDB is almost too easy, but if you've been flattening things out into tables for 20 years, it may seem hard. If you create Mongo entities that are a 1:1 port of equivalent items in your RDBMS, you'll be sorely disappointed by MongoDB's performance, consistency, and so on.

[ Also from Andrew C. Oliver: 10 common tasks for MongoDB. | Work smarter, not harder -- download the Developers' Survival Guide from InfoWorld for all the tips and trends programmers need to know. | Keep up with the latest developer news with InfoWorld's Developer World newsletter. ]

On the Web, you'll find plenty of negative opinions about the quality of MongoDB. You'll also come across rave reviews and glowing recommendations. I'm willing to bet that the difference between thumbs-up and thumbs-down in 99 percent of those cases is whether the schema was or was not appropriate for a document database.

Identifying documents is the key

If you've used Hibernate or another JPA or OR-Mapping tool, then you're familiar with the concept of a "dependent object" or "composite component."

The classic example is a street address that often consists of two lines, a city, a state or province, and a postal code. In your database table you may just have columns, but because your object-oriented system needs to validate the simple types, you have a type hierarchy and a separate class from your entity. However, entries frequently have more than one or two addresses, so you may one day break that into a separate table.

We don't truly care about duplicates (meaning two people with the same address might result in rows of the same address in the table that differ only in their key) -- we only care that we can add your beach house to your personnel record so that we can find you. For a person, addresses, phone numbers, and IM accounts are all examples of things that do not usually require another document and are embedded in the parent object in MongoDB. In the event a person is deleted, you'd delete the addresses, phone numbers, and IM accounts with them or cascade the delete. You wouldn't have those other items without the parent object.

In addition to cascading deletes, other key signs that your objects belong in the same document are where a foreign key is part of a primary key, most 1:1 relationships, and nearly anywhere that you would never read from a "child" table row without first reading from the "parent" table row. There are exceptions, but this is a good place to start. In short, it probably matches your object model more closely than your average RDBMS schema.

1 2 Page 1