3 little things in Linux 4.10 that will make a big difference

Here's what virtualized GPUs, better cache management, and better handling of writes will mean to your cloud provider and your Linux apps

Linux never sleeps. Linus Torvalds is already hard at work pulling together changes for the next version of the kernel (4.11). But with Linux 4.10 now out, three groups of changes are worth paying close attention to because they improve performance and enable feature sets that weren’t possible before on Linux.

Here’s a rundown of those changes to 4.10 and what they likely will mean for you, your cloud providers, and your Linux applications.

1. Virtualized GPUs

One class of hardware that’s always been difficult to emulate in virtual machines is GPUs. Typically, VMs provide their own custom video driver (slow), and graphics calls have to be translated (slow) back and forth between guest and host. The ideal solution would be to run the same graphics driver in a guest that you use on the host itself and have all the needed calls simply relayed back to the GPU.

There’s more here than being able to play, say, Battlefield 1 in a VM. Every resource provided by the GPU, including GPU-accelerated processing provided through libraries like CUDA, would be available to the VM as if it were running on regular, unvirtualized iron.

Intel introduced a set of processor extensions, called GVT-G, to enable these things, but only in Linux 4.10 does OS-level support finally show up. In addition to kernel-level support for this feature via KVM (KVMGT), Intel has contributed support for the Xen and QEMU hypervisors.

Direct GVT-G support inside the kernel means third-party products can leverage it without anything more than the “vanilla” kernel. It’s akin to how Docker turned a collection of native Linux features into a hugely successful devops solution; a big part of the success was because those features were available to most modern incarnations of Linux.

2. Better cache control technology

CPUs today are staggeringly fast. What’s slow is when they have to pull data from main memory, so crucial data is cached close to the CPU. That strategy that continues to this day, with ever-growing cache sizes. But at least some of the burden for cache management falls to the OS as well, and Linux 4.10 introduces some new techniques and tooling.

First is support for Intel Cache Allocation Technology (CAT), a feature available on Haswell-generation chip sets or later. With CAT, space in the L3 (and later L2) cache can be apportioned and reserved for specific tasks, so a given application’s cache isn’t flushed out by other applications. CAT also apparently provides some protection against cache-based timing attacks—no small consideration given that every nook and cranny of modern computing is being scrutinized as a possible attack vector.

Hand in hand with support for this feature is a new system tool, perf c2c. In systems with multiple sockets and nonuniform memory access (NUMA), threads running on different CPUs can make the case less efficient if they try to modify the same memory segments. The perf c2c tool helps tease out such performance issues, although like CAT it’s based on features provided specifically by Intel processors.

3. Writeback management

“Since the dawn of time, the way Linux synchronizes to disk the data written to memory by processes (aka. background writeback) has sucked,” writes KernelNewbies.org in its explanation of how Linux writeback management has been improved in 4.10. From now on, the I/O requests queue is monitored for the latency of its requests, and operations that cause longer latencies—heavy write operations in particular—can be throttled to allow other threads to have a chance.

In roughly the same vein, an experimental feature, off by default, provides a RAID5 writeback cache, so writes across multiple disks in a RAID5 array can be batched together. Another experimental feature, hybrid block polling (also off by default), provides a new way to poll devices that use a lot of throughput. Such polling helps improve performance, but if done too often it makes things worse; the new polling mechanisms are designed to ensure polling improves performance without driving up CPU usage.

This is all likely to have a major payoff with cloud computing instances that are optimized for heavy I/O. Amazon has several instance types in this class, and a kernel-level improvement that provides more balance between reads and writes, essentially for free, will be welcomed by them—and their industry rivals. This rising tide can’t help but lift all boats.

Next read this:

Serdar Yegulalp is a senior writer at InfoWorld, focused on machine learning, containerization, devops, the Python ecosystem, and periodic reviews.