6 machine learning projects to automate machine learning

Tweaking machine learning algorithms and models won't always be for experts only, thanks to these cutting-edge projects

Senior Writer, InfoWorld |

6 machine learning projects to automate machine learning — Thinkstock

The power of machine learning comes at a price. Once you have the skills, the toolkit, the hardware, and the data, there is still the complexity involved in creating and fine-tuning a machine learning model.

But if the whole point of machine learning is to automate tasks that previously required a human being at the helm, wouldn’t it be possible to use machine learning to take some of the drudgework out of machine learning itself?

Short answer: a qualified yes. A collection of techniques, under the general banner of “automated machine learning,” or AML, can reduce the work needed to prepare a model and refine it incrementally to improve its accuracy.

Automated machine learning is still in its early stages. Today it is implemented as a slew of disparate pieces and disconnected technologies, but it’s fast shaping up to be productized and made available for the average business user, rather than the machine learning expert.

Here are six automated machine learning tools leading the way.

Auto-sklearn and Auto-Weka

Two examples of automated machine learning already in the wild come in the form of enhancements to the widely used Scikit-learn project, a package of common machine learning functions.

Scikit-learn comes with several different “estimator” functions, or methodologies for learning from provided data. Since choosing the right estimator can be a tedious exercise, the Auto-sklearn project aims to remove some of that tedium. It provides a generic estimator function that conducts its own analysis to determine the best algorithm and set of hyperparameters for a given Scikit-learn job.

Auto-sklearn still requires some manual intervention. The end user needs to set limits on how much memory and time the tuning process can use. But it’s far easier to make those choices and let the machine decide the rest over time than it is to tinker with model selections and hyperparameters.

For machine learners using Java and the Weka machine learning package, there is a similar project called Auto-Weka. Auto-sklearn was in fact inspired by the work done for Auto-Weka.

Prodigy

One labor-intensive aspect of creating supervised machine learning models, such as for natural language processing, is the annotation phase. A human being has to create metadata by hand to describe, or annotate, the data used by the model.

It’s not possible to completely automate that process—at least, not yet. However, it is possible to use machine learning to speed up the process and make it less ornery.

That’s the premise behind an annotation tool named Prodigy. It uses a web interface to make the training process as speedy and intuitive as possible for models that need annotated datasets. Annotations already added to the dataset are used to guide future annotations, helping accelerate the annotation process over time.

Prodigy makes strong use of Python as a machine learning environment. It provides Python modules for training models, testing them, exploring annotated datasets, and managing the results between projects. Finished models can be exported as Python packages and put directly into production by way of any other Python app.

H2o Driverless AI

Another offering that aims to make machine learning more approachable for non-experts is H2o.ai’s Driverless AI. Driverless AI is designed for business users familiar with products like Tableau, who want to gain insights from data without having to learn the ins and outs of machine learning algorithms.

Like Prodigy, Driverless AI uses a web-based UI. Here the user chooses one or more target variables in the dataset to solve for, and the system serves up the answer. The results are presented via interactive charts, and explained with annotations in plain English.

Unlike Prodigy, Driverless AI is a proprietary product. Much of H2o.ai’s stack is open source, but this particular component is not. It’s one sign that commercial products, rather than open source stacks, may be the primary method of bringing machine learning to non-technical users.

Google’s AutoML and Vizier

In recent months, Google has pointed to two projects of its own—albeit entirely internal projects—as examples of how the company is implementing automated machine learning.

The first project, “AutoML,” was created to automate the design of multi-layer deep learning models.

“The process of designing networks often takes a significant amount of time and experimentation by those with significant machine learning expertise,” says Google. Instead of having human beings toss-and-test one deep learning network design after another, AutoML uses a reinforcement learning algorithm to test thousands of possible networks. Feedback from each run of the algorithm can be used to create new candidate architectures for the next run. With enough runs, the training mechanism can figure out which model constructions yield better results.

Another Google project, christened Google Vizier and outlined in a paper published in August, is a “service for black-box optimization.” In plainer English, it’s a way to find the best operating parameters for a system in cases where it is hard to correlate between the parameters you feed in and the results you get out.

According to the paper, Google used Vizier to study how many of its own services could be improved by tweaking their behaviors. Examples included “tuning user–interface parameters such as font and thumbnail sizes, color schema, and spacing, or traffic-serving parameters such as the relative importance of various signals in determining which items to show to a user.”

Right now Vizier is only for internal Google use. But it’s not unreasonable to expect Google to eventually offer a productized version of the service or even release it as an open source project, in the same way TensorFlow was developed internally and then released to the world at large.

Next read this:

Serdar Yegulalp is a senior writer at InfoWorld, focused on machine learning, containerization, devops, the Python ecosystem, and periodic reviews.