First look: Chef’s Habitat puts automation in the app

By packaging configuration and runtime requirements with the app, Habitat decouples automation from the infrastructure

Deploying new software to production can be hard -- really hard. If you’re among the many businesses adopting new infrastructure and deployment technology today, you’re keenly aware of how difficult it can be. Even as you adopt modern devops tools to streamline development, test, deployment, and ongoing management, and to bring development and operations teams closer together, it often seems you're only creating new silos.

The fundamental problem is that all of the information critical to each application’s lifecycle -- build dependencies, runtime dependencies, configuration settings -- remain scattered across separate tools devoted to separate stages of the application lifecycle. An open source project called Habitat, from the people who brought us Chef, promises a better way, one that cuts across all of these silos. 

Habitat is a set of build and runtime tools that bring all of the information for configuring and running an application -- that is, all of the knowledge about how the application is supposed to behave -- into the application itself. Among other things, Habitat empowers developers to define configurations, lifecycle hooks, update strategies, clustering strategies, and service discovery for the applications they work on; all of this information travels with the application.

It’s an approach that effectively shifts what would normally be last-mile deployment problems to the beginning of the development lifecycle. It also decentralizes knowledge of how applications run. Applications essentially become self-aware, self-sustaining instances or communities of interdependent services, instead of blind followers of centralized sources of truth.

Habitat achieves this by providing a mechanism for defining application behavior, as well as providing a packaging strategy that runs the application along with a supervisor that coordinates that behavior. The supervisor process that is co-located with your service will join a ring of other supervisors and begin to gossip about the state of the services. Armed with the information you provided about how an application should behave at runtime, the supervisor can communicate the application’s needs to other supervisors and make changes to the running application based on new information coming in.

For example, if an application defined by Habitat relies on a database that is also deployed from a Habitat package, the supervisor of the application will gather information about the database from other supervisors and provide location details to the application when it comes online. 

A little background

To illustrate the challenges that Habitat is built to address, I’m going to describe a common situation encountered when adopting new infrastructure and deployment technology. I’ll use Docker as the example, but you could substitute the name of pretty much any modern devops tool.

Your organization decides it would be a good idea to adopt Docker. The technical team begins to wrangle the new tool, hitting stumbling blocks and learning the ins and outs of using it for development. Frustration ensues, which is typical of any new tool adoption, but eventually the team finds its stride and the overhead due to unfamiliarity recedes. With a new confidence, the team begins converting as many of its old development resources to use Docker as it can, and it updates projects to be easily run locally using Docker Compose. Everything looks good and the team begins working on deploying to production.

That’s when everything starts to fall apart. All of a sudden, they’re up against a mountain of issues they haven’t had to deal with before using Docker in local development. Now they need private registries to store proprietary images. They need task schedulers to distribute the work efficiently, service discovery to account for the new deployment flexibility, and secrets management to safely pass credentials to containers. They need health checks for container-based services and a way to coordinate clustered services into common topologies. They need three more days in every week.

Some of these new challenges simply can’t be avoided, regardless of what new tool you choose. If you want to have a dynamic workload running across a pool of machines, you can’t avoid task scheduling. Likewise, if you want the flexibility to schedule services anywhere, you’re going to need service discovery so that dependent applications can find one another.

However, there are two big frustrations that could be eliminated with Habitat.

First, all of the information about how to run the application is scattered across the system. The application code lives in a version control system with a readme file describing how to build the software locally. Those build steps are duplicated on your CI server in order to build testing and production environments. Health-checking scripts might live with deployment assets in a completely different repository. To weave all of it together, you write a combination of shell scripts, Dockerfiles, Marathon application definitions, and configuration templates simply to describe how the application is supposed to behave.

Second, all of this new overhead is in addition to the existing infrastructure and the procedures used to support apps that couldn’t be transitioned to fit nicely into containers.

Habitat allows you to package an application together with all of the instructions for building, configuring, running, and updating it, and all of that information travels with the app. Further, Habitat can be used to package not only new, containerized applications, but your traditional applications as well.

Building a Habitat package

The folks building Habitat took a universal approach when they chose how to define application behaviors: Almost everything is driven by shell scripts. This is a great choice, considering so much application configuration and deployment is already managed by shell scripts. To begin working with Habitat for a particular application, you define a plan to build the application’s binaries. The plan is simply a shell script that expects a set of fields and functions, some of which are mandatory. The plan needs to know things like the application name and version, but also where to download the source code and what are the build and runtime dependencies.

habitat package

Package dependencies in Habitat are so fine-grained, you have to explicitly include Bash.

These application dependencies are references to other Habitat packages, and they are quite fine-grained. When Habitat packages up your application, it will include the most stripped-down runtime it can, and it assembles that runtime based on the dependencies you specify.

In the example shown above, I have to specifically include Bash as a dependency because Neo4j uses a Bash shell to start the server. I also needed the Java runtime, which has a set of dependencies of its own. Luckily, Habitat knows to pull in those as transient dependencies, and I don’t have to worry about it. Now that these resources have been defined, the next step is to define how to build the package using a set of optional callback functions. These functions have default values, so if you need to extract a TAR file and run ./configure && make && make install, you don’t have to override these.

habitat overwriting defaults

Overriding default behavior in Habitat is as simple as writing a shell script.

In the example screenshots, I’m not actually pulling down source code and creating binaries. I’m simply pulling down prebuilt binaries and putting them in the right places. I happen to be building an open source library, but this process could be as easily applied to enterprise software for which you don’t have the source code. By packaging enterprise applications in the same way as internal or open source applications, Habitat makes management a breeze. You interact with everything through a single, uniform interface.

Defining runtime behavior in Habitat

The next step is defining the configuration and runtime behavior of an application. Runtime behavior is defined by a set of shell scripts called hooks. If you define them, your hooks will be used by the supervisor during key lifecycle events like initialization, starting the service, configuration updates, and health checks. These shell scripts can also include Handlebars template expressions. If you use template syntax, Habitat will process your scripts and insert values at runtime. Thus, Habitat provides a convenient mechanism for using runtime information in your scripts.

If your service relies on configuration files, you define them in the package as well, optionally with Handlebars expressions. The values you can reference can be metadata about where other services live, what is the service’s own IP on startup, or which member of the service’s cluster is the leader. The values can also include a set of arbitrary values that can be passed in by the user to customize the runtime of the application. You have to specify default values for any user serviceable configuration in a TOML file (similar to a YAML file), which will then be used by Habitat to provide documentation to any consumer of the package. This little detail means you will never again have to hunt around to figure out what properties can be changed.

habitat configuration settings

Habitat puts all available configuration settings in one place, and they're well documented.

Running the application

After defining the build, you can build the package artifact by running a single command. Once you’ve built the artifact, you have some options on how to run it. You can run it locally from the command line in a single command for debugging, but you’ll likely want to export it or upload it to a central repository for artifacts called a depot. The artifact has all of the application files, configuration, hooks, and dependencies you need to run the application, so it’s easy to export different formats. You can currently export to Docker, Mesos, Application Container Image (ACI), and TAR files. You can then take the output format and deploy however you like. Another option is to upload the artifact to a depot, which is a service that stores uploaded artifacts for download at a later time. Given a package is in a depot, you can run that package on any machine with Habitat installed with the same command you ran to test locally. Wrap the command in a systemd script and you’re set to go.

Before going into the specifics of starting a Habitat service, it should be made clear that the services need to run somewhere. Part of the flexibility of Habitat’s output formats are that they can be agnostic to the infrastructure, but they still depend on the infrastructure being there. As a result, adopting Habitat doesn’t mean you’ll be ditching configuration management tools like Chef and Terraform. Those tools are for infrastructure automation while Habitat functions at higher level in the stack. Habitat even plays nicely with Docker, despite Docker also being a packaging tool of sorts. That means you package applications with Habitat while taking advantage of efficient container distribution from tools like DCOS and Kubernetes. It’s the best of both worlds.

In practice, you’ll actually want to run the package with a few flags. Unless it’s the first application to start up, you’ll want to pass in the IP address of another application so that the new supervisor can join the existing ring. When the supervisor bootstraps, it will automatically pull down metadata about other Habitat applications and the existing configuration for the service and update its own service with the salient details.

If the package is the first instance of an application or service, you may want to pass in some initial configuration. You can do this through environment variables or a TOML file specified by command-line flags. This is where you can also enable key features like topologies. Topologies let you indicate that you want your service to have a leader and many followers, or that one instance of your service should bootstrap fully before any other instances are brought up. The supervisors will handle all of the heavy lifting for you, letting you simply reconfigure service instances based on the results of leader election and failover.

Growing pains

Habitat is a promising, young open source product, and it's growing fast. As a result, a major version has not yet been released, and I wouldn’t recommend putting it into production yet, although I’ve heard companies have already done so with success. While the philosophy of Habitat is going to remain constant, many of the implementation details are still being negotiated, which means that breaking changes are on the horizon that would disrupt existing Habitat deployments.

Additionally, the documentation for Habitat is in rough shape for production users. Sections of documentation are lagging behind by months, and some features aren’t documented at all. By adopting Habitat at this early stage, you might be one of the first users to encounter certain bugs and find yourself poking around the source code for answers more often than not. If that’s your cup of tea, you’re encouraged to participate in the community by submitting issues, pull requests, and discussing issues in the Habitat Slack.

Painful as Habitat might be to troubleshoot and despite the fact that it’s not production ready, I’ve already started writing Habitat builds for internal tools and I have been happy with the results. The ability to look at a small handful of files to see how an application is going to behave throughout its entire lifecycle is comforting and informative. Additionally, it’s the only approach I’ve seen that embraces both greenfield projects and legacy applications.

By bringing the benefits of cutting-edge deployment practices to legacy applications, Habitat promises to be as much a tap into the fountain of youth for the enterprise as it is a simplifier for the mayhem of modern microservice architectures.

Copyright © 2016 IDG Communications, Inc.