Even data scientists are facing AI takeover

As euphemisms fly for AI replacing human activity, data scientists start to experience the benefits and risks of AI-assisted automation

Even data scientists are facing AI automation of their work
Thinkstock

People are starting get jumpy about the prospect of AI being used to automate everything and anything. Now that AI has proven its ability to squeeze out both blue-collar jobs (through robotics et al.) and white-collar occupations (through natural language generation et al.), cultural sensitivities surrounding this technology are on the upswing.

That may explain why we’re starting to see people use near-synonyms and quasi-euphemisms for “automation” when it comes to discussing AI’s impact. Some observers prefer to use terms such as “operationalize,” “productionize,” “augment,” and “accelerate” when discussing the encroachment of automation into the development of AI-driven applications. We also see a fair bit of discussion around “self-service” tools for building “repeatable workflows” and the like, which certainly sounds like a next logical step to automating that workflow.

This aversion to the dreaded word “automation” may stem from the fact that even data scientists are starting to worry about its potential impact on their own job security. It’s with this cultural zeitgeist in mind that I read Andrew Brust’s recent article about Alteryx’s new tool for “operationalizing” machine learning models. He provides a very good discussion not only of the data-science productivity-boosting benefits of that offering, but of different solutions from other vendors that all, to varying degrees, push automation deeper into data-science development, deployment, and optimization workflows.

In my own research at Wikibon, I’ve seen a significant surge in what we call “devops for data science,” which, I suppose, is yet another euphemism for automation. Although Brust says there’s “nothing but upside” to the prospect of squeezing manual labor out the data-science workflow, it’s clear that many low-level functions, which might otherwise be handled by less-skilled (but nonetheless employed) data scientists might never be touched by human hands ever again.

Alteryx’s tools are squarely in the mainstream of what leading-edge data science tool vendors are offering now, so they nicely show what automation data scientists can expect to come their way:

  • The no-code Alteryx Designertool automatically generates customized REST APIs and Docker images around machine learning models during the promotion and deployment stage.
  • Alteryx’s new Promote tool, which uses the data science model management technology it acquired recently with Yhat, automatically deploys the models for execution in the Alteryx Serveranalytics platform.
  • Promote can automatically scale each model’s runtime resource consumption up or down based on changing application requirements.
  • Designer workflows can be set up to automatically retrain machine learning models, using fresh data, and then interface to Promote to automatically redeploy them.
  • Promote, in turn, automatically ensures model governance by keeping track of which model version is currently deployed and making sure that there is always one sufficiently predictive model in production.

Perhaps I shouldn’t overstate the potential of automation to put data scientists on the dole. If anything, data-science automation tools will help them do more with less. These capabilities may even, as they offload the repetitive tasks, enable data scientists to grow their skills into more creative and challenging realms. Automation may even allow them to stave off the specter of a labor shortage in their own profession. As a recent MIT Technology Review article notes, a lack of skilled personnel may, without a healthy dose of automation, grind the AI/machine learning revolution to a halt.

Even skilled data scientists can’t master every last trick of the trade, which opens the door to automated tools that can assist them in such arcanery as dynamically optimizing model hyperparameters.

Automation is coming to every segment of the data development, deployment, and management pipeline. More data professionals are adopting industrial-grade automation capabilities that accelerate the execution of repeatable processes such as data ingestion, preparation, cleansing, and delivery.

Copyright © 2017 IDG Communications, Inc.