Nvidia GPU Cloud bundles frameworks and tools for AI app dev

Nvidia's software stack for running machine learning is built to use local resources, the Nvidia DGX-1 GPU system, or GPUs in the cloud

Over the past couple of years, every major cloud vendor added GPU resources as standard issue: Amazon, Microsoft, Google, IBM. In most every case, those GPUs come from Nvidia, the premier supplier of GPUs for high-performance computing and the de facto software standards for it.

Yesterday, Nvidia unveiled the Nvidia GPU Cloud, a new plan for leveraging GPUs wherever they may reside. The details are still vague, but it's clearly been built to aid developing the apps now most closely associated with GPU acceleration: machine learning and artificial intelligence.

A guided deep learning experience

The few details currently known about Nvidia GPU Cloud come mostly from the company's press release. The phrasing throughout indicates that GPU Cloud amounts to an end-to-end software stack for deep learning. It features many common frameworks for GPU-accelerated machine learning—Caffe, Caffe2, CNTK, MXNet, TensorFlow, Theano and Torch—along with Nvidia-specific tools for deep learning, including support for running the above in Docker containers.

There's been a growing need in for applications that provide complete workflows for machine learning so data ingestion, normalization, model training, and prediction generation are all handled through a single consistent pipeline. Nvidia GPU Cloud sounds like at least partial fulfillment of that ideal, "a simple application that guides people through deep learning workflow projects across all system types," as the company puts it in its press notes.

Stack it up here, stack it up there

Another of the few explicitly stated goals for GPU Cloud is to provide a consistent software stack for GPU resources, wherever they may be found locally or remotely.

GPU Cloud can use GPUs attached to a single PC, but it also can use Nvidia's DGX-1 supercomputing appliance. Those who shell out the $149,000 for that Tesla V100-powered behemoth won't have to cobble together their own software stack to make it more than a 960-teraflop space heater. Instead, they can use GPU Cloud as a sort of deep learning OS for the device.

Nvidia also hints at how GPU Cloud can run in the cloud and use GPU resources there. No details have been provided yet about cloud integration, but but it seems inevitable that would include any of the major commodity clouds with GPU support. Such clouds could be used for native data storage as well as GPUs, so data could be gathered, stored, and processed all in the same domain.

It's not clear what varieties of elasticity will be possible with GPU Cloud, but it looks like horizontal scaling will possible with the same kind of hardware. "Users can start with a single GPU on a PC and add more compute resources on demand with a DGX system or through the cloud," according to the press release. What remains to be seen is how readily local resources can be scaled out to a cloud or vice versa.

Another question is to what extent Nvidia will provide its own cloud infrastructure. GPUs provided by an existing commodity cloud are convenient—they're a known quantity, and they're close to the data in question. But they're provided on the terms of the commodity cloud that supplies them. It's possible Nvidia has its own GPU-as-a-service cloud in mind that would not only offer the most cutting-edge hardware right as it became available, but in configurations that competitors would have a hard time matching.

Next read this:

Serdar Yegulalp is a senior writer at InfoWorld, focused on machine learning, containerization, devops, the Python ecosystem, and periodic reviews.