Google's Cloud Spanner melds transactional consistency, NoSQL scale

The research behind the horizontally scalable, SQL-compatible database has spawned imitators, but Google's private network is the real secret sauce

Earlier this year, Google offered a peek at Cloud Spanner, an automanaged database service that melds features from both conventional relational systems and NoSQL technologies.

Today, Google announced Cloud Spanner will be available to the general public later this month. It will compete not only with rival cloud databases, but also up-and-coming open source projects that address scale and reliability issues by using Google's own ideas.

The best of both worlds

Google presents Cloud Spanner as a happy medium between two common database needs that often prove incompatible. A database can be highly scalable and distributed (the NoSQL approach), or it can be transactionally consistent (the conventional database approach). Cloud Spanner aims to be both.

As laid out in a 2012 research paper, one key to accomplish this is a time synchronization mechanism for actions that need to be kept consistent between nodes—such as globally consistent read operations, which people expect from a transactional database.

This sync mechanism takes into account the potential differences between timestamps provided by different machines in the cluster and can "wait out" the differences if they are too large. But the system also tries to keep uncertainty to a minimum by drawing on multiple time sources to increase clock accuracy. As a result, it's easier to get operations spread across multiple nodes (for example, MapReduce) to agree on when something was achieved and to deliver consistent results.

In a white paper published earlier this year, Google talked about another key element: How Cloud Spanner leverages Google's own network. Of the three characteristics that are most desired from a distributed system—consistency, availability, and tolerance for splits between nodes—Cloud Spanner tries to deliver all three by making slight but often undetectable sacrifices to availability, aided by the fact that the service runs on Google's own highly redundant network.

A little more scale, a little less SQL

The actual database Google has created from this technology strongly resembles other cloud-hosted transactional databases, but with some potentially irksome differences.

First, Cloud Spanner is advertised as having support for ANSI 2011 SQL queries. The documentation shows this is true for SELECT queries; they support all the familiar SQL syntax, including JOIN and GROUP BY. But INSERT and UPDATE commands are not available; according to a blog post at Quizlet, which used Cloud Spanner in beta, you need to use "RPCs for mutating rows given their primary key" instead. Some of this is made easier through Cloud Spanner's language and interface support, as it provides libraries for Go, Java/JDBC, Node.js., and Python, as well as support for REST calls.

Cloud Spanner's other touted advantage is scale and availability. The database autoscales based on demand, with pricing based on the number of nodes in use, storage needed on those nodes, and outbound bandwidth consumed. Right now the size of a database influences the number of nodes required to deploy it; every 2TB of database storage requires at least one node to support it.

Imitation and flattery

Cloud Spanner's promises are echoes of features in other database products, although Google is clearly hoping to compete broadly by offering a better amalgamation of features in one place.

Take autoscaling, for instance. Ex-Microsoftie Bob Muglia served up Snowflake as a cloud data-warehouse system that didn't need to be tweaked or tuned. There, Google can almost certainly compete on pricing, as it has its own infrastructure, where Snowflake is implemented on Amazon.

Speaking of Amazon, it has a few products that could be competition. Aurora, for instance, is Amazon's hosted version of MySQL, and it beats Google's MySQL offering for high-end work. It also has the advantage of being familiar and widely supported; there's barely a database developer who hasn't touched MySQL at some point. But again, Google's hope is that Cloud Spanner will compete by offering better scale across the board, including for write operations and not only reads.

Then there's CockroachDB, which is approaching its first full 1.0 version. This open source database project is an implementation of the ideas in Google's Spanner paper, in much the same way Google's paper on MapReduce inspired Hadoop.

Where Google wants to stand out, though, is in the execution. That explains the white paper professing how it isn't only the time-synchronization functions that makes Cloud Spanner special, but also Google's tight control over the networking between nodes. It might be possible for another cloud to implement that through a CockroachDB-based service, but Google's counting on first-mover advantage—and all the major back-end resources it can work with—to make an impression.

Next read this:

Serdar Yegulalp is a senior writer at InfoWorld, focused on machine learning, containerization, devops, the Python ecosystem, and periodic reviews.