Review: Google Cloud Spanner takes SQL to NoSQL scale

Google Cloud Spanner achieves horizontal scalability, strong consistency, and five nines availability as a service, at a price

1 2 Page 2
Page 2 of 2

Quizlet considered and ruled out Amazon Aurora and Google Cloud SQL, which are limited to a single master node, just like Quizlet’s existing architecture. It considered XtraDB, Vitess, and CitusDB, which introduce architectural changes to enable clustering for MySQL and Postgres. It considered switching to a horizontally scalable NoSQL database such as Cassandra, HBase, MongoDB, DynamoDB, or Bigtable. It even considered scaling vertically with large, expensive, enterprise-grade hardware, and rolling its own scaling technology. It also considered CockroachDB, but hesitated because, without the use of atomic clocks and GPS time, CockroachDB can only guarantee serializability, not linearizability.

Bakkum found Spanner’s lack of SQL DML statements (e.g. INSERT and UPDATE) annoying, but not a deal-breaker. He tested a synthetic query workload that matched Quizlet’s production loads on Spanner, and found that Cloud Spanner queries have higher latency at low throughputs (3,000 queries per second) compared with a virtual machine running MySQL. However, Spanner’s scalability means that a high-capacity cluster can easily handle workloads that stretch Quizlet’s MySQL infrastructure. At 9,000 queries per second Spanner keeps chugging at the same latency while MySQL’s latency has jumped way up. In a variational study, Bakkum got up to 33,200 queries per second with 30 Spanner nodes. He also discovered a few quirks in the way Spanner does index traversal and query optimization with multiple indexes.

Looking at Cloud Spanner Demo

Google created a demo for the Cloud Spanner launch (see the Keynote demo video and the screenshot above) that simulates a ticket selling app. The authors kindly supplied the source code (lots of shell scripts, Go language and Python programs, YAML configurations, Docker files, and some SQL) for me to examine (and run if I wish) but not to publish, since the code has not yet been released publicly and was designed for live demonstrations only. The database schema with row counts is shown in the screenshot below.

google cloud spanner demo schema IDG

This screenshot shows the SQL database schema for the ticket selling demo, with the row counts for selected tables. Note that the Ticket and TicketInfo tables each have 3 billion rows.

The code I received was designed to run the three pieces of the demo (the data load generator, the back end, and the buy-bots) in Kubernetes clusters locally or on Google Compute Engine, for development purposes. For the demo at scale, all three parts were run in Google Container Engine.

I found the code less different from normal embedded SQL than I expected, despite the idiosyncratic way Spanner requires you to write INSERTs, the need to specify the index to use in your SELECT statement (e.g. FORCE_INDEX=SeatingCategoryById), and the need to use at least 10-second-stale reads to enable fast local reads in different regions when fetching lists of events. It’s certainly not as different as data design and code for Google Cloud BigTable, which after all is a column-based NoSQL database, but it still took me awhile to adjust my thinking.

Going through the official Getting Started documentation for Cloud Spanner (in my case, in the Go language) was almost as informative as reading the “real” code from the launch demo. I highly recommend picking whichever of the six tutorials best fits your projects and skill set and going through all the steps.

Google Cloud Spanner do’s and don’ts

As we’ve seen, Cloud Spanner is very much as Google has represented it: a managed, global, relational database with ACID properties and strong consistency across regions. It automatically does horizontal row-range sharding, which gets around the scaling limit of most relational databases, and if you specify that you want it, it co-locates and interleaves child tables with parent tables for performance.

Cloud Spanner is not what you want as the storage layer for a small website or the back end of a game. It is what you want if you have a massive OLTP database that must be consistent at all times, such as a global commerce system. Cloud Spanner’s SQL dialect is a little different from what you may know from working with, for example, MySQL, but it is clearly SQL and does SELECT statements with JOINs as you would expect. Prepare to spend some time revising your database schema if you migrate from a conventional relational database to Spanner.

Don’t bother with Cloud Spanner if you need horizontal scalability but can get by with eventual consistency. Use a NoSQL database instead. Don’t bother with Cloud Spanner if you need strong consistency but don’t need horizontal scalability. Almost any SQL database will do. Don’t bother with Cloud Spanner if you need to host your database in your own datacenters or you need open pource (consider CockroachDB). Do investigate Cloud Spanner, and consider running a proof of concept, if you have mission-critical OLTP needs that require you to get as close as you can to beating the CAP Theorem.

—-

Cost: Google Cloud Spanner is billed at $0.90 per node per hour (except in asia-northeast1 region, where the rate is $1.17 per node per hour) plus $0.30 per GB per month ($0.39 per GB per month in asia-northeast1). Cross-region and Internet egress rates also apply. Network ingress is free. 

InfoWorld Scorecard
Performance (25%)
Scalability (20%)
Management (25%)
Availability (20%)
Value (10%)
Overall Score (100%)
Google Cloud Spanner 9 10 10 10 9 9.7
At a Glance
  • Google Cloud Spanner is a fully managed relational database service that offers strong (and external) consistency as well as horizontal scalability for mission-critical online transaction processing (OLTP).

    Pros

    • Strongly consistent OLTP across regions
    • Horizontally scalable for writes as well as reads
    • SQL query engine with JOINs
    • Fully managed database as a service

    Cons

    • SQL dialect and data types are not fully compatible with MySQL, Postgres, SQL Server, or Oracle
    • Only available in Google Cloud
    • Not open source
    • More expensive than other database options

Copyright © 2017 IDG Communications, Inc.

1 2 Page 2
Page 2 of 2