Review: Google Bigtable scales with ease

If your data runs to hundreds of terabytes or more, look to Cloud Bigtable for high performance, ease of use, and effortless scaling without downtime

Review: Google Bigtable scales with ease
Thinkstock
At a Glance
  • Google Cloud Bigtable

Editor's Choice

When Google announced a beta test of Cloud Bigtable in May 2015, the new database as a service drew lots of interest from people who had been using HBase or Cassandra. This was not surprising. Now that Cloud Bigtable has become generally available, it should gain even more attention from people who would like to collect and analyze extremely large data sets without having to build, run, and sweat the details of scaling out their own enormous database clusters.

Cloud Bigtable is a public, highly scalable, column-oriented NoSQL database as a service that uses the very same code as Google’s internal version, which Google invented in the early 2000s and published a paper about in 2006. Bigtable was and is the underlying database for many Google services, including Search, Analytics, Maps, and Gmail.

Bigtable inspired several open source NoSQL databases, including Apache HBase, Apache Cassandra, and Apache Accumulo. HBase was designed as an implementation of Bigtable based on the paper and became the primary database for Hadoop. Cassandra was born at Facebook using ideas from Bigtable and the key-value store Amazon Dynamo. Accumulo is a sorted, distributed key-value store with cell-based access control that started out as the NSA’s secure take on Bigtable.

While HBase had its moment in the sun, its market share now isn’t as large as most in the industry expected a few years ago. As Matt Asay explained earlier this year, “its narrow utility and inherent complexity have hobbled its popularity and allowed other databases to claim the big data crown.” And as Rick Grehan explained in depth in 2014, HBase has too many moving parts and is too hard to set up and tune for mere mortals.

While Cassandra is a bit more popular, has a SQL-like query language, and is easier to get up and running than HBase, it is still complicated and has a significant learning curve. Accumulo is more of a niche database, primarily seeing service for government applications.

How Bigtable works

Cloud Bigtable uses a highly scalable, sparsely populated table structure, where each table is a sorted key/value map. A Bigtable row describes a single entity and is indexed by a single row key; a column contains individual values for each row. Column families group related columns. Each row/column intersection can contain multiple cells at different time-stamps, and cells without data take no space.

Client requests go through a front-end server before they are sent to a Cloud Bigtable node. Each node handles a subset of the requests to the cluster. Adding nodes increases the number of simultaneous requests that the cluster can handle and increases the maximum cluster throughput.

 A Cloud Bigtable table is automatically sharded into blocks of contiguous rows (aka tablets) to help balance the workload of queries. Tablets are stored in SSTable files on Google’s Colossus file system. An SSTable provides a persistent, ordered, immutable map from byte-string keys to byte-string values, and each tablet belongs to one node. Writes are logged as well as being written to SSTable files. Each node has pointers to a set of tablets, but holds no rows of data itself.

google cloud bigtable architecture

The Bigtable architecture allows multiple clients to access a front-end server pool, which in turn addresses the nodes in a Cloud Bigtable cluster. These in turn read and write SSTables on the Colossus file system.

Cloud Bigtable features

Cloud Bigtable delivers very high performance under high load, even compared to other NoSQL services. Part of that flows from the inherently efficient design, and part of that comes from the fast, scalable infrastructure. Along with high performance, Bigtable exhibits very low latency. I’ll come back to this shortly and explore the specifics.

Cloud Bigtable uses redundant internal storage, which it automatically scales as your data grows. This is much more convenient than setting up a local database, establishing replicas and failovers, and allocating physical disks. In addition, Cloud Bigtable encrypts data both in-flight and at rest, and it allows you to control access at the Cloud Platform project level. It doesn’t, however, allow for fine-grained control over data access.

You can scale Cloud Bigtable up or down by adding or subtracting nodes during operation without the need for a restart. Bigtable typically requires about 20 minutes to take full advantage of a new node in the cluster. You can access Bigtable programmatically using an RPC-based API or through an HBase-compatible interface.

With respect to performance, Cloud Bigtable is designed to handle very large amounts of data (terabytes or petabytes) over a relatively long period of time (hours or days). It’s not designed to work with small amounts of data in short bursts. According to Google, “If you test Cloud Bigtable for 30 seconds with a few GB of data, you won’t get an accurate picture of its performance.”

For SSD clusters, Google claims that typical steady state performance is 10,000 queries per second per node for both reads and writes, with 6ms latency. Google claims that scans on SSD clusters deliver 220MB per second per node. HDD clusters are slower, especially for reads.

Google offers a checklist of a half-dozen reasons to investigate if your performance doesn’t reach this level. The first item on the list: “The table’s schema is not designed correctly.”

Bigtable schema quirks

Designing a schema for Bigtable is quite different from designing a schema for a SQL database. Forget relational algebra and third normal form: They don’t apply.

To start with, each table has only one index, the row key. According to Google, the most efficient Cloud Bigtable queries use the row key, a row prefix, or a row range to retrieve the data. Other types of queries trigger a full table scan -- and scans are slow when tables run to petabytes of data.

To get around the lack of secondary indices, you can include multiple identifiers in your row key. Because the row key is limited to 4KB and short row keys perform better than long row keys, you may want to hash each identifier and concatenate the hashes to make up the key.

As for rows, you have to keep in mind that row operations are atomic, and rows are read into memory. For instance, you can have a situation where you write two rows, one write succeeds, and one write fails. If those two rows were a related debit and credit, you’ve thrown the books out of balance -- and your application should be smart enough to recover from the failure.

That problem might be a result of trying to use a transactional schema design on a database that doesn’t have multirow transactions. Ideally, all information about a given entity should be in one row.

Atomic row operations also cut the other way. Suppose your rows are 100MB each (the maximum size recommended) and you read 25 rows, but your instance has only 2GB of RAM available: You will trigger an out-of-memory error. You may have to make trade-offs when designing the amount of data held in a single row.

If you think about the most famous application of Bigtable, the web index, you might imagine that the row key for the site table would be the domain name. That’s close, but probably wrong. The row key is more likely the reversed domain name. That’s because a reversed domain name, such as com.company-name.subdomain-name, keeps all of the information about the subdomains of company-name.com in the same row range, which is efficient. A nonreversed domain name is not only inefficient for reads, but also defeats the content compression algorithm, which works best if identical values are near each other, either in the same row or in adjoining rows.

google cloud bigtable create instance

Creating an instance is one of the functions you can easily perform in the Bigtable console. Here we are creating a three-node cluster (the minimum number), which should provide 30,000 queries per second once it has warmed up. Other actions you can perform in the console include adding and removing nodes and deleting the cluster.

Bigtable console, command line, and APIs

You can use the Bigtable console for basic management tasks, such as creating and scaling clusters, as shown in the figure above. The Bigtable console includes a Google Cloud Shell, which you can see running an HBase shell in the figure below.

You can also install and use the GCP command line and the HBase shell on your own machine. You can use the GCP command line for management tasks, and the HBase shell for data reads, writes, queries, and scans.

To make use of Bigtable, you’ll most likely want to use it from a program. If your code is in Java, you have the choice of using the HBase client library or gRPC. There are also Go and Python client libraries, a Go CLI, a Dataflow connector, and gRPC implementations for C, C++, Node.js, Python, Ruby, Objective-C, PHP, C#, and Go. There are third-party client libraries for .Net and Scala.

google cloud bigtable hbase shell

Here we are running the Bigtable Quick Start exercise in an HBase shell that we built in a Google Cloud Shell. If you work with your Bigtable instance this way, you don’t have to bother with credentials. You can also install an HBase shell on your own machine. In that case, you get credentials for your instance using the gcp command line.

Several Google and open source products integrate with Bigtable. On the Google side, you can integrate with Cloud Dataflow, a Java batch and streaming processing service; and Cloud Dataproc, a cloud offering of Spark, Hadoop, and related Apache projects. If you want to work with time series, you can use Spotify’s Heroic monitoring system and time-series database, or OpenTSDB.

The Cloud Bigtable quickstart, shown in the figure above, teaches you how to create a cluster and use the Google Cloud Shell to load an HBase shell, connect to your instance, create a table, write some data, scan a table, and delete the table. This is only the most basic functionality of Bigtable, but it’s worth going through the exercise before you dive into designing schemas and programming against Bigtable.

Showcase deployments

Google is justifiably proud of two third-party Bigtable deployments from the beta-test period. One, Heroic, is Spotify’s production monitoring system, which has been migrated to Bigtable from Apache Cassandra. The other is Fidelity National Information Service’s bid for the SEC Consolidated Audit Trail (CAT) project, which was able to achieve 34 million reads per second and 23 million writes per second on Cloud Bigtable as part of its market data processing pipeline. That amounts to processing 25 billion stock market events per hour.

All in all, Cloud Bigtable is a convenient, high-performance alternative to hosting your own HBase or Cassandra installation on-premise to handle terabytes or petabytes of data. Cloud Bigtable works especially well if your other software is hosted on the Google Cloud Platform in the same availability zone as your Bigtable cluster.

What about cost? It’s 65 cents per node per hour (minimum three nodes), plus 17 cents per gigabyte per month (SSD), plus network egress. That doesn’t sound like much, but at petabyte scales that will add up.

For giggles, let’s calculate the monthly cost for 30 Bigtable nodes (probably the low end for performance purposes) with SSD storage and a petabyte of data, accessed from an application running in the same GCP availability zone so that there are no egress costs.

  • 30 nodes x 65 cents per node per hour x 730 hours per month = $14,235
  • 17 cents per gigabyte x 1 million gigabytes per petabyte = $170,000
  • Total monthly cost: $184,235, plus the cost to run the application

For comparison, plain storage of a petabyte in Amazon S3 costs $30,000 per month -- but you’d still have to process the data, so you’d be looking at one of Amazon’s NoSQL databases. As an example, the cost of a petabyte in Amazon Redshift Dense Storage nodes is $83,333 per month, but that’s for slower hard disk storage. Redshift Dense Compute nodes use SSDs, comparable to the Bigtable SSD clusters we’ve been discussing, but they can scale only to hundreds of terabytes, not a full petabyte. You might also consider Vertica or Greenplum as a service for a petabyte of data.

What would it cost to build and operate a data center for a big Cassandra or Accumulo installation capable of holding petabytes of data? That will depend on your location, the kinds of servers and disks you buy, the cost of power and labor, the amount of storage you’ll ultimately need, and the costs to build and cool the facility. I guess the extreme case would be the NSA’s facility in Utah, which has been estimated to handle 12,000 petabytes (12 exabytes) and have an annual maintenance bill of $20 million -- never mind the estimated $1 billion worth of hardware or the construction cost.

Because every case is different, my advice is to evaluate your personalized options both technically and financially. Cloud Bigtable may be your best cloud database option at petabyte scale, but you won’t know unless you design a schema and run a test installation at terabyte scale for a few days to see how it works for your application.

InfoWorld Scorecard
Performance (25%)
Scalability (20%)
Management (25%)
Availability (20%)
Value (10%)
Overall Score (100%)
Google Cloud Bigtable 9 10 10 10 9 9.7
At a Glance
  • Google Cloud Bigtable is a massively scalable, fully managed, high-performance NoSQL database service for large analytical and operational workloads.

    Pros

    • Scales to petabytes of data
    • Delivers consistent single-digit millisecond latency and high throughput
    • Lets you add and remove nodes without incurring downtime
    • Fully managed and self-balancing
    • Provides an HBase-compatible API

    Cons

    • Not portable outside of Google Cloud, although compatible with HBase
    • Needs to be used from the same availability zone for best performance

Copyright © 2016 IDG Communications, Inc.