The method behind Google's machine learning madness

When Google open-sourced its TensorFlow and SyntaxNet machine learning technology, it gave away its most precious intellectual property. Here's why that was smart

First there was TensorFlow, Google’s machine learning framework. Then there was SyntaxNet, a neural network framework Google released to help developers build applications that understand human language. What comes next is anyone’s guess, but one thing is clear: Google is aggressively open-sourcing the smarts behind some of its most promising AI technology.

Despite giving it away for free, however, Google is also apparently betting that “artificial intelligence will be its secret sauce,” as Larry Dignan details. That “sauce” permeates a bevy of newly announced Google products like Google Home, but it’s anything but secret.

Indeed, as Google, Facebook, and other Web giants keep teaching us, software’s magic no longer derives from ones and zeroes, but rather from the services delivered through those ones and zeroes. In short, having Google’s code is not the same as being Google.

What makes Google … Google?

Speaking of Google Home, Google’s answer to Amazon Echo, Dignan suggests, “Google Home's secret sauce is really Google.”

Though the context of this statement suggests that Dignan is referring to all the other Google services and data that combine to deliver the Google Home experience, there’s more to it. Buried in that statement is the key to answering why Google can open-source its artificial intelligence code and still make a mint selling services around it.

Google’s code isn’t Google. Running that code as only Google can, is.

Craigslist’s Jeremy Zawodny essentially made this point to me in a heated discussion 10 years back at OSCON. I had challenged Zawodny (then at Yahoo) for not contributing more code as open source. Though initially Yahoo and Google (Chris DiBona) suggested that “no one would understand the code” even if they were to open it, a more compelling argument came from Zawodny later (captured here):

Yahoo’s applications are tightly bound together, making it difficult to open one piece without giving away information about how the remainder is written, or making it useless because knowing 1/10th of the application wouldn’t be helpful (because of all the unknown code).

In short, getting some of Yahoo’s (or Google’s) code doesn’t magically turn you into Yahoo or Google. Even if you were able to get the entire code base, it wouldn’t transform you into a Web giant, because you’d still need to be able to run it like them, and more important, you wouldn’t have all the data that gives life to the code.

So why is Google bothering to open-source this machine learning code?

Build with Google, run on Google

One smart answer comes from David Mytton: Google wants to “standardize machine learning on a single framework and API,” then supplement it "with a service that can [manage] it all for you more efficiently and with less operational overhead.” Standardize on Google’s smarts, then operate them at Google scale:

The [TensorFlow/Tensor Processing Units] announcement demonstrates that Google has optimized dedicated hardware designed specifically to "deliver an order of magnitude better-optimized performance per watt for machine learning." This is a true differentiator because nobody else offers such a service on their cloud platform and with the increasing focus on AI and machine learning, positioning Google Cloud as the best place to run machine learning workloads is important.

Some developers will cavil at this inherent control Google retains over the code, but that’s a bit churlish. After all, no one is forcing developers to run their code on Google’s cloud. Even if they don’t, Google wins if developers rally to its machine learning projects and help to improve them.

The model for this is Kubernetes, an open source implementation of Google’s internal management systems. In 2015, Google gave control of Kubernetes to the Cloud Native Computing Foundation, joined by IBM, AT&T, Huawei, and a range of others. Google may employ lots of incredibly smart engineers, but it doesn’t employ all of them. By open-sourcing its code, Google gets access to the industry’s brightest minds, without giving up its crown jewels.

Those crown jewels, as mentioned, really have little to do with software and everything to do with running the software at scale, all while amassing and putting to use mountains of data. It’s why we can correctly say that Google is both open-sourcing its machine learning code and that machine learning will be its secret sauce.

The two aren’t contradictory. They’re complementary.

Next read this:

Matt Asay runs developer relations at MongoDB. The views expressed herein are Matt’s and do not reflect those of his employer.