InfoWorld review: Databases primed for social networks
Neo4j, Cassandra, and FluidDB represent a breed of databases that swiftly search social networking dataFollow @peterwayner
Neo4j's real power lies in its ability to solve problems that demand repeated probing throughout the network. You can bundle up a query in a traversal object that will scan through multiple connected nodes to find the answer. It will repeatedly ask for one row of a traditional database, then use that information to search for a new row again and again and again. By contrast, a traditional database would require a separate query for each step through the search, driving traffic through the roof.
Searching algorithms aren't news to anyone who's taken basic computer science courses. There are even a number of libraries, such as JGraphT, that implement many of the classic graph algorithms in Java. The beauty of Neo4j is that it turns these data structures into a database by adding persistence, transactions, and caching. You just keep dumping those nodes in, and Neo4j will find a way to store them on disk so that they can be found after the power failure.
In building a few projects, I found Neo4j's performance to be quite good in cases involving deep searching through the networks. Neo4j promises results that are a thousand times faster than a relational database, and this seems entirely consistent for intense problems such as searching everything in a big network.
It's pretty easy to bump up against the limits of Neo4j today. Implementing a project requires some forethought, much like the design work that goes into planning a schema for a relational database. The challenge lies in the fact that searches are all on the nodes, not the relationships on them, and this confused me for a bit. I wanted to skip looking through all these nodes and zoom in on only the ones bound by a relatively rare relationship. The trick is to create, then grab, extra nodes that represent the different types of relationships out there.
Moreover, searching for a particular node with a particular attribute is better handled with Lucene, which now comes bolted onto the bigger distribution of the Neo4j project. If you want to ask for the wives of all of Bob's male friends, you would first use Lucene to search for Bob, then turn to the Neo4j part of the API to search his social network. The Neo4j project is starting to expand, however, with the addition of new algorithms and data structures.
Neo4j is beginning to attract all of the necessary extras to build production tools out of it. Some nice subprojects, add-ons, and tools have appeared in fertile open source projects. Ruby and Scala bindings offer REST-ful interfaces. An Eclipse plug-in, Neoclipse, draws the graphs in Eclipse so that you can debug them. There are tools to suck in SQL databases and others to back up the database.
Neo4j's documentation for the project is composed of excellent pages and thin sections. There is a fair amount of discussion devoted to optimizing the performance of the system, which shows that the group is serious about using the tool in real applications where caching and transaction costs matter.
Neo4j comes with one of two licenses: AGPL (the tightest open source license) or a commercial license from Neo Technology.