Review: Neo4j supercharges graph analytics

When it comes to tracking relationships, Neo4j is faster, more flexible, and more scalable than relational databases

1 2 Page 2
Page 2 of 2

Neo4j performance and scalability

While benchmarking Neo4j in a meaningful way is not really possible for me as a reviewer, the company provided several metrics based on its own tests and on customer experience. For example, Neo4j Inc. has compared the performance of the Union-Find and PageRank algorithms in Neo4j and Apache Spark GraphX. The data set contained 1.47 billion relationships and 41.65 million nodes extracted from Twitter. Neo4j outperformed GraphX by roughly a factor of two on Union-Find and roughly a factor of four on PageRank, using clusters of 128 CPUs.

In a customer deployment, Neo4j replaced an Oracle RAC cluster to calculate optimum room pricing for Marriott Hotels and demonstrated 10 times the transaction rate on half the hardware. The Neo4j system at Marriott can perform 300 million pricing operations per day.

Every node in a Neo4j high availability cluster contains the database and a cluster management component, and the cluster can be accessed through a load balancer. The full graph is replicated to each instance of the cluster, and the read capacity of each HA cluster increases linearly with the number of server instances. Neo4j can commit tens of thousands of writes per second while maintaining fully ACID transactions.

In a Neo4j causal cluster, a new Neo4j Enterprise feature, a core cluster of read-write servers is combined with one or more asynchronously updated clusters of read replicas. Any application is guaranteed causal consistency, meaning that it is guaranteed to read at least its own writes, even when hardware and networks fail. The read replicas in a causal cluster may be geographically distributed to improve query performance for users near the replicas.

Neo4j use cases

Neo4j has been used successfully for fraud detection, real-time recommendations, master data management, and network and IT operations. It has also been used for investigative journalism, to analyze both the Panama Papers and the Paradise Papers.

In the fraud detection area, a graph database can quickly reveal abnormal situations, such as a single IP address using many credit card numbers belonging to multiple people. For e-commerce, it helps a great deal if such fraud detection can be done in real time.

Neo4j has been used by Walmart to suggest products to customers based on their preferences, in real time. eBay has used Neo4j for a real-time courier/package routing solution, and reported it “to be literally thousands of times faster than our prior MySQL solution, with queries that require 10 to 100 times less code.”

Powerful connections

Neo4j is both the original graph database and the continued leader in the graph database market. After working with it and looking at some of its case studies, I can see why it continues to attract both open source users and paid enterprise customers.

In its latest Enterprise incarnation, Neo4j has scalability and survivability that rivals CockroachDB, although that isn’t true of the open source version of Neo4j, which doesn’t cluster. As a native graph database with ad hoc properties, Neo4j can explicitly express relationships between entities and capture a variety of information for different nodes without creating sparse rows or a multitude of join tables. That makes Neo4j vastly more efficient than SQL or NoSQL databases for tasks that look at networks of related items, such as fraud detection.

One of the human costs of replacing a SQL database with Neo4j is education: learning the Cypher query language, the two libraries, and graph database design. (A similar statement could be made about most NoSQL databases.) While there is quite a bit of carry-over from SQL to Cypher, especially in WHERE clauses, the Cypher MATCH statement is quite different from a SQL SELECT statement because it acts on graph patterns rather than tables.

Whether Neo4j will be an improvement on your existing relational system will depend very much on how “graph-y” your problems and data sets are. If your table rows tend to be sparsely populated, and your queries tend to involve heavily nested joins, then a graph database is probably right for you—and Neo4j would be an excellent choice.

Cost: Community Edition: Free open source. Enterprise Edition: Free for development and startups. Per-machine Enterprise subscription licensing has two tiers, four cores and 24 cores; expandable bundles typically start with three-machine clusters and include support services.

Platform: Windows, MacOS, Linux (Debian and Red Hat), and Docker.

At a Glance
  • Neo4j is vastly more efficient than SQL or NoSQL databases for tasks that look at networks of related items, but the graph model and Cypher query language will require learning.

    Pros

    • Native graph storage and native graph engine
    • Supports ACID properties
    • Has cluster support and runtime failover
    • Better performance than relational databases for “graph-y” applications
    • Open source version available

    Cons

    • Cypher query language is not exactly SQL and takes some learning
    • Graph database design is different from relational database design
    • Open source engine does not support clustering

Copyright © 2018 IDG Communications, Inc.

1 2 Page 2
Page 2 of 2