Learn to use Docker containers

Docker tutorial: Get started with Docker volumes

Learn the ins, outs, and limits of Docker's native technology for integrating containers with local file systems

Docker tutorial: Get started with Docker volumes
Public Domain

Docker containers are meant to be immutable. The code and data they hold never change. Immutability is useful when you want to be sure that the code running in production is the same that passed QA testing; it’s not so useful when you need someplace to write data and persist it across application lifetimes.

Most of the time, you can address the need for data persistence by using an external database. But sometimes an application in a container just needs to use a local file system, or something that looks like a local file system.

Enter Docker volumes, Docker’s native mechanism for dealing with local storage. A Docker volume is a convenient way to allow containerized apps to write and retrieve data through a local file system or file system like interface. But Docker volumes are not a panacea for managing state. They need to be used wisely.

How Docker volumes work

Docker volumes are a way to map a file path inside a container, called a mount point, to a file system path or file-like object outside the container. Anything written to the Docker volume will be stored externally, so will persist across the lifetime of one or more containers. Plus, multiple containers are allowed to access the same volume at the same time, with some caveats.

Docker volumes use a Docker storage driver to control how they interact with the storage layer they use. By default this driver provides access to the local file system, but it’s possible—and in some cases, recommended—to use other storage systems.

One example of a Docker storage driver (also called a “volume driver”) is the Cloudstor plug-in, which comes preloaded with instances of Docker deployed on Docker for AWS and Docker for Azure. Cloudstor provides connectivity to file shares on those cloud services.

Another example, Blockbridge, provides direct access to iSCSI targets through the volume driver layer. Yet another, more ambitious example, is REX-Ray, a storage engine that works with a variety of storage vendors and standards, and can provide connectivity via Docker’s volume plug-in system or the more general Container Storage Interface spec. Likewise, Rancher’s Convoy provides volume connectivity for a slew of different back ends, such as VFS/NFS shares and Amazon EBS.

Creating Docker volumes manually

The most basic way to create a volume is to include the -v or —volume flag, mount point, and target when you start a container:

$ docker run -P —name websvc -v /websvcdata myorg/websvc python app.py

This creates an “anonymous” volume with the mount point websvcdata and with the data stored in a randomly generated directory used by the Docker process.

This would be a good way to create a quick-and-dirty dumping ground for data in the course of a given container session. But it’s not as useful for persisting state across container sessions, since the name of the volume isn’t known ahead of time, and the volume can’t be reused efficiently.

You can accomplish the same thing in a Dockerfile by including a VOLUME instruction that describes the location of a volume:

FROM ubuntu: latest
VOLUME /websvcdata

This is essentially identical to the above command-line instruction; it’s just codified in the Dockerfile for the container. But again, the volume name is randomly generated at creation time, so this approach is only useful for writing data that isn’t really meant to persist beyond a container’s lifetime.

Using the Docker volume API

For those using Docker 1.9 or later (and if you’re not, why not?) the volume API provides a way to create named volumes, as opposed to anonymous ones. Named volumes can be easily attached to one or more containers, and thus reused a good deal more easily.

$ docker volume create websvcdata

This creates a Docker volume named websvcdata. However, the Docker volume doesn’t have a mount point in a container yet. Here’s an example for how to create the mount point:

$ docker run -P —name websvc -v websvcdata:/websvcdata myorg/websvc python app.py

This command is the same as the previous docker run example, but instead of the volume being created with an anonymous name, it’s created with the name websvcdata on the host. You can run docker inspect on the container and read the ”Mounts” section in the resulting dump to determine if the mounts are as intended.

Note that you can’t create a named volume with a Dockerfile, because names for Docker volumes must be specified at runtime.

If you pass flags specific to the Docker storage driver, you can dictate many options for the volume’s creation. With the local file system driver, for instance, you can describe where to place the volume, what device or file system to use (such as an NFS share or a temporary file system), and many other controls. This way, you can place the volume on the best device for the particular use case.

A useful tip: If you create a volume and bind it to a path inside the base image that already contains data, the data inside the base image will be copied to the volume at bind time. This is a handy way to pre-populate a volume with a set of data that you want to use as a starting point.

Sharing Docker volumes between containers

If you want more than one container to attach to the same Docker volume, all you have to do is create the volume and attach it to multiple containers:

$ docker run -ti —name instance1 -v DataVol1:/datavol1 ubuntu

$ docker run -ti —name instance2 —volumes-from DataVol1 ubuntu

$ docker run -ti —name instance3 —volumes-from DataVol1:ro ubuntu

This creates three containers, instance1 through instance3, and attaches DataVoll to each of them. The instance3 container has DataVol1 mounted as read-only, as per the :ro after the volume name.

Be warned that Docker does not automatically mediate conflicts between containers that share the same volume. That’s up to your application. (More on this below.)

Removing Docker volumes

Volumes are not automatically removed from disk when a container is removed. This is by design, because you don’t want to remove a volume that could conceivably be used by another, as-yet-unused container in the future. That means volume unmounting and on-disk cleanup are your responsibility.

Docker API version 1.25 and greater provide built-in tools for facilitating volume cleanup. The docker volume command has a subcommand, docker volume prune, that removes all volumes not in use by at least one container in the system. You can also modify the scope of the deletion—e.g., remove all volumes associated with a given container—by passing command-line flags.

The limits of Docker volumes

Docker volumes aren’t cure-alls for local persistency. Because of the way containers interact with local file systems, Docker volumes can create more problems than they solve.

The first thing to understand about Docker volumes is that Docker does not handle file locking in volumes used by multiple containers. That becomes the responsibility of whatever application you’re using. If you’re not confident that the application in question knows how to write to a shared file system, you could end up with file corruption in that volume. One possible solution would be to use an object storage server—for instance, a project like Minio—instead of the local file system.

Another issue with Docker volumes that they can make app portability more difficult. Every machine’s storage topology is different. If you create volumes in such a way that makes assumptions about where various things are in the system, you’ll quickly find those assumptions may not be true if you try to deploy the same containers on a system you didn’t build yourself. This is less problematic if you’re using containers only on systems where you have rigorous control over the topology—e.g., an internal private cluster—but it can come back to bite you if you decide to re-architect things later.

Finally, avoid using volumes to store stateful data that is better handled through another native mechanism in Docker. Application secrets, for instance, should be handled by Docker’s own secrets system or a third-party product like HashiCorp’s Vault, and never by way of volumes or writable container image layers.

Copyright © 2018 IDG Communications, Inc.