The secrets to LinkedIn's open source success

Highly useful LinkedIn projects like Kafka, Samza, Helix, and Voldemort have gained broad adoption -- and LinkedIn engineers have benefited from the experience

Open source is the gift that keeps on giving ... unless it destroys your business first. As many an open source vendor can tell you, it's a slog peddling free ones and zeroes, and it's only getting harder as the Web giants flood the world with high-quality, zero-cost software.

Web giants like LinkedIn, for example: Take a look at LinkedIn's GitHub page, and you'll discover the death of dozens of real or potential startups. Yes, LinkedIn, the company ostensibly set up to "connect the world's professionals to make them more productive and successful," is also a company that has released more than 75 open source projects, some of which have grown up to become huge successes with developers and the enterprises for which they work.

Though Facebook and Google get more press for their open source work, LinkedIn has quietly built a world-class developer program that depends upon and feeds open source communities. I caught up with Igor Perisic, LinkedIn's VP of Engineering, to better understand how LinkedIn makes open source work for the company.

Open code is only the start

Anyone can open-source their code. Indeed, for years code repositories like Sourceforge were littered with open source "projects" that were lucky to get more than one contributor (80 percent attracted two or fewer contributors). Even if multiple contributors were listed, the majority of open source projects hadn't been updated in more than six months.

The fact that LinkedIn has open-sourced more than 75 projects to date means little in and of itself. An open source project is only as useful as the community it attracts -- and that's exactly why LinkedIn's open source story is so fascinating.

As Perisic says, "Numbers can often be vanity metrics. We consider community adoption to be our key indicator of success." For instance, both Pinot, a real-time distributed OLAP datastore LinkedIn uses to deliver scalable real-time analytics, and its REST.li REST framework have more than 1,000 GitHub stars and have been forked around 200 times.

Also, one of the best health metrics you can get from looking at a repository are the number of contributors and time since the last update. Both metrics are indicators of engagement, either over time or across the broader open source community, Perisic notes.

But there's more to community. Other, better ways of assessing the larger value of LinkedIn's open source work have more to do with industry standards of approval, such as inclusion within the Apache Software Foundation:

Several of our open source projects, like KafkaSamza and Helix, have gained broad adoption and are now part of the Apache Software Foundation. Voldemort is a distributed key-value storage system this is becoming increasingly popular. REST.li is a very popular REST framework. Generally speaking, we've made a concerted effort to make our open source project attractive to other developers.

It's this last point that resonates. Too often companies release code that's useful for themselves and hope a massive community will arise around it ... to make it even more useful for that company. Open source foundations have largely followed this same self-centered logic, pretending at open governance even as a single vendor controls the outcomes.

Not that LinkedIn was always a paragon of this open source community virtue. As Perisic describes, "One lesson we learned early is that you can't just put software out into the community and not continue to innovate. We've also learned that many of the things that determine whether an open source project will be successful are related to how you engage with the community."

This means, he suggests, that the hardest work begins after code has been open-sourced. For example, LinkedIn has learned the importance of reaching out to the community for feedback and ensuring that the goals of your project are easily understood. Critically, though, the most important decision, Perisic stresses, is the first one: "deciding whether it makes sense to open-source a project in the first place." If you're not prepared for the ongoing work of open source, it's better to not open the code at all.

Why bother?

Given the difficulty inherent in opening code and growing a community around it, why bother? Though much of the value flows outside LinkedIn, a big reason for engaging in open source communities is the effects it has on engineers, according to Perisic.

"We've found that the first result of open-sourcing your projects is that your developers will write better software." Keeping code behind closed doors encourages sloppiness, encouraging a developer's "tendency to cut some corners. This especially happens when it comes to documentation, making code easily readable, and having all the right tests in order."

Opening up code, however, opens up a developer to criticism -- very public criticism. In Perisic's words, "When a developer open sources a piece of code, their reputation is on the line. It's essentially a type of peer review. This gives developers a huge incentive to make sure that their code that is well written, well documented, and reusable."

But it's not about good code hygiene alone. It's also about ensuring developers don't become parochial in their outlook. "Working on an open source project exposes [our developers] to the developer community outside of the company where they work. It will help them become more aware of new trends, and help them learn how to assess the value of other developers' input." In this sea of differing opinions, "developers learn how to lead their own teams more effectively."

Finally, Perisic notes, "from a company's perspective it also helps develop your engineering brand, which proves useful in attracting new talent and retaining existing employees.

Plant your code and let it grow

"I believe that you have a responsibility as the creator of a project to spend as much time on [an open source project] as if it was still part of your internal organization," Perisic tells me. It's also critical to release the right kind of code. "One big reason why some projects don't develop strong communities comes from standalone projects that exist in a silo." The code might make sense in isolation within your company, but if it isn't obviously useful outside your company, it's likely to fail.

Sometimes, however, it's better to leverage the energies and organization of an existing project, even if it doesn't come with the same level of public credit. "Standalone open source projects are great, but if it makes sense to put a project under the Apache umbrella, don't hesitate. If there's existing community of developers eager to use it, don't hesitate to make it open source."

What about those times when code outlives its usefulness at LinkedIn? Given what Perisic says about the need to be a good open source citizen, how does he handle code that "the community" wants but LinkedIn no longer needs?

"You can't just abandon a project," he stresses, "But you should instead present users with alternatives and a migration path." For example, LinkedIn discontinued an open source project called Camus that was used as its pipeline to pull data from Kafka into HDFS. Rather than simply abandoning the project, Perisic and team went the extra step to ensure that they provided a migration path so that Camus users could easily adopt Gobblin, LinkedIn's new data induction framework.

All of this requires "significant time to develop, monitor, and nurture," but Perisic doesn't mind. As he concludes, the open source community "often pays you back this investment many times over."

Copyright © 2016 IDG Communications, Inc.