LLVM-powered Pocl puts parallel processing on multiple hardware platforms

Open source implementation of OpenCL automatically deploys code across numerous platforms, speeding machine learning and other jobs

LLVM, the open source compiler framework that powers everything from Mozilla’s Rust language to Apple’s Swift, emerges in yet another significant role: an enabler of code deployment systems that target multiple classes of hardware for speeding up jobs like machine learning.

To write code that can run on CPUs, GPUs, ASICs, and FPGAs—hugely useful with machine learning apps—it’s best to use the likes of OpenCL, which allows a program to be written once, then automatically deployed across different types of hardware.

Pocl, an implementation of OpenCL recently revamped to version 0.14, uses the LLVM compiler framework to do the targeting. With Pocl, OpenCL code can be automatically deployed to any hardware platform with LLVM back-end support.

Pocl employs LLVM’s Clang front end to take in C code that uses the OpenCL standard. Version 0.14 works with both LLVM 3.9 and the recently released LLVM 4.0. It also offers a new binary format for OpenCL executables, so they can run on hosts that don’t have a compiler available.

Not only able to target multiple processor architectures and hardware types automatically, Pocl also uses LLVM to “[improve] performance portability of OpenCL programs with the kernel compiler and the task runtime, reducing the need for target-dependent manual optimizations,” according to the release note for version 0.14.

There are other projects that automatically generate OpenCL code tailored to multiple hardware targets—for example, the Lift project, also written in Java. Lift generates a specially tailored intermediate language (IL) that allows OpenCL abstractions to be readily mapped to the behavior of the target hardware. In fact, LLVM works like this; it generates an IL from source code, which is then compiled for a hardware platform. A similar project, Futhark, generates GPU-specific code.

LLVM is also in use as a code-generating system for other aspects of machine learning. The Weld project generates LLVM-deployed code that is designed to speed up the various phases of a data analysis framework. Code spends less time shuttling data back and forth between components in the framework and more time doing actual data processing.

The development of new kinds of hardware targets is likely to continue driving the need for code generation systems that can target multiple hardware types. Google’s Tensor Processing Unit, for instance, is a custom ASIC devoted to speeding a particular phase of a machine learning job. If hardware types continue to proliferate and become more specialized, having code for them generated automatically will save time and labor.

Next read this:

Serdar Yegulalp is a senior writer at InfoWorld, focused on machine learning, containerization, devops, the Python ecosystem, and periodic reviews.