My fans know that my standard test for everything from PaaS (platform as a service) to alternative programming languages is something I call Granny's Addressbook, a simple CRUD application. As it turns out, Granny's Addressbook is also an excellent test for graph databases, the black sheep of the NoSQL quadrumvirate (see "Which freaking database should I use?"). To add functionality to the Addressbook, we've used the leading graph database: Neo4j. Thanks to Phil Rhodes for his contributions to this article and for writing my code for me so that I had more time to cozy up to a nice glass of Lagavulin, neat. -- Andy Oliver
My kids' paternal grandmother -- I call her "Mom" -- has a problem that is not solved by her killer app, Granny's Addressbook. Frequently, she asks me what gift to get the kids for the all-too-frequent stream of birthdays, Hanukkah, Christmas, Kwanzaa, Arbor Day, and whatever holidays. Frequently, I inform her that the children are already spoiled and I've gotten them anything they need and if not, there is a good reason that I don't want them to have it. Those reasons vary from "it's flammable" to "it's a weapon if you hold it right" to "it makes annoying noises that I don't want to hear" to the more common "I don't feel like figuring out how to assemble it and by the time it gets assembled the pieces will have been lost."
In other words, Grandma has a data problem. If she were to map out what all of the kids' friends or other children close to their age got for their birthday, she could easily figure out what to buy them without asking me.
As you recall, Grannny's Addressbook is a sample app I've used to demonstrate various technologies. It is essentially a one-table CRUD app that stores addresses in a database. It has no search or security and it doesn't really need to scale. The GUI looks like this:
In this case, the gift-giving problem has landed Granny in a complex data dilemma. She needs to know the kids' ages and their relationships to other people. She also needs to know what each kid got for his or her last birthday. We will represent this with three rather naive fields: First is "Friends," which is a multiselect list box. The second is "Birthday." The third is "Last Present," representing gifts from their most recent birthday. So long as those fields are populated, we'll populate another field called "Suggested Presents," which is a list of gift ideas. The new GUI looks like this:
We essentially want to recommend a gift for a person based on what each of friend and friends of friends got for their last birthday. We also want to order the gifts by the distance of the relationship.
For example: Tommy is friends with Billy and Bryan. Bryan is friends with Kelly and Jamie. Jamie is friends with Steven. Billy is friends with Keith. We can look at the "distance of the relationships" thusly:
Tommy = 0 (the origin)
Billy = 1
Bryan = 1
Kelly = 2
Jamie = 2
Keith = 2
Steven = 3
Doing this in a relational database structure is pretty painful. In the JPA/SQL version of granny, I need two tables and multiple trips to the database just to walk the graph of relationships. I also have to do pretty much all the ordering and logic myself.
In other words, this is a classic graphical database problem. Relationships matter as much as, if not more than, the data itself. Neo4j is the most popular graph database on the market these days. While graph databases are part of the NoSQL movement, they really solve different problems than, say, Couchbase or MongoDB. We aren't necessarily concerned with handling massive scale or doing analytics across terabytes of big data a la Hadoop's HBase. In fact, most graph databases are transactional, and the reason they are NoSQL is that SQL is simply inadequate to express the problems, as you can see in the amount of code it took in the findSuggestions method.
For the Granny4j version using Neo4j the main query comes down to this:
// select friends and friends of friends, order by depth of the relationship
String findFriendsQuery = "start n=node(*), person=node({userNode}) MATCH p = (person)-[:FRIEND*1..2]-(friend) return distinct p order by length(p)";
As you can see there is a lot less code -- and it does the job. It's also more efficient. Check out all the code for Granny4j.
Why is this important? Theoretically, you can hire offshore developers for as little as $15 an hour who know SQL -- meaning the technology and people who know it are commoditized. Neo4j presumably requires more expensive expertise that is in lower demand. Nonetheless, there's always a correlation between lines of code and the number of bugs. We can decrease downtime and errors by decreasing the number of bugs per line, but it's an expensive process, and ultimately, it's easier to decrease the number of lines of code.
There's also a big efficiency issue. Even on my laptop, the unit test for GrannyJPA takes considerably longer than Granny4j. If you consider this at the kind of scale that a major retailer would require and take into account the law of diminishing returns, there's a real performance and scalability issue.
The biggest objections to introducing a new structured storage technology are usually related to the experience with the technology within the organization or "single source of record." While the latter concern is indeed a problem when combining many types of NoSQL databases with an existing SQL database, it wouldn't be a problem with Neo4j. Like most graph databases, Neo4j is transactional. As for the former consideration, that exists with any new or in this case different technology.
Personally, I'd rather be moving forward at a deliberate pace and finding new efficiencies than standing in place because it's what I've always done. Moreover, graph database technology isn't that much younger than the RDBMS.
As I've mentioned in the past, it all comes down to data structures. By using the RDBMS for everything over the last few decades, the industry has done the equivalent of using a list for every data structure. You wouldn't use only one data structure for every type of data in memory, why would you do that just because you're storing the data?
Sadly, my mom refuses to use any of the fancy tools I've developed. She's just stopped asking me what to get the kids and asks my wife instead.
This article, "What are graph databases good for? Here's a killer app," was originally published at InfoWorld.com. Keep up on the latest developments in application development and NoSQL, and read more of Andrew Oliver's Strategic Developer blog at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.