AI DevOps: How to End High AI Attrition Rates

    Dr Matt Jones



    It’s a familiar story. The data science team lobs an AI ‘proof of concept’ over the wall to the IT team. The IT team wraps it into an app using software they’re familiar with – say Java – and deploy it into enterprise IT, making it accessible to employees or customers.

    After a while, they get reports it’s no longer working. Accuracy either fell on deployment or tailed off as the model learned to adapt to real-world data. Users report biased decisions against certain groups or that results feel wrong and aren’t understood.


    Is this due to model issues, differences between test and real-world data, or errors in implementation? It’s not clear. It gets lobbed back to the data science team. Perhaps it comes back with fixes, perhaps not. Either way, the IT team gets blamed for failing to deliver AI.

    Like many unhappy marriages, the problem is communication. Data science and IT teams speak different languages, each expecting the other to understand the issue and resolve it, each shouting louder every time they don’t. The result is a final product that’s a mess.

    Time for AI DevOps

    There is precedent for solving such problems.

    DevOps emerged in response to very similar communication issues between developers and IT operations. It involved establishing collaborative working structures and technologies. The result was that the development and deployment of software and updates into corporate IT became faster and more innovative.

    AI should take a similar route. But while lessons can be learned from traditional DevOps, AI is not the same. Unlike software, AI does not follow clear rules. Instead, it learns how to interpret information by establishing connections between different data sets. AI DevOps, therefore, needs its own set of structures that reflect the complexities and uncertainties of data.

    Delivering effective AI DevOps

    Delivering AI DevOps involves integrating the processes that take a model from design to production, including ongoing monitoring. Based on our experience of successful projects, here are three factors that consistently deliver AI projects that work.

    1.      Process & organization

    As with traditional DevOps, barriers need to be broken down, and cross-functional teams need setting up, focusing on delivery of business value rather than just technical deliverables.

    These teams should be guided by stage-gated governance frameworks that progress the model from design to implementation with checks at each stage. These checks allow feedback from all parties to ensure everyone understands where they are, and that they’re delivering against business goals.

    Our own RAPIDE framework provides a useful starting point. This six-step process for progressing models involves:

    • A business Readiness assessment
    • Advanced screening of data to confirm its validity for the application
    • Pinpointing causal links rather than correlations
    • Identifying multiple modeling options
    • Developing the most suitable
    • Evolving models post-deployment

    At the deployment end, our Trusted AI framework ensures successful implementation of models, using checks to ensure they’re usable, explainable, and don’t create privacy or ethical concerns.

    2.      Automation & infrastructure

    Automated approaches to integration, testing, system release and updating, deployment, and infrastructure management should be established and used to control the production and release of trusted AI solutions into the enterprise.

    Continuous integration routinely integrates code changes back into the system source code. Every time a change is made by an AI engineer, their changes are automatically integrated, checked, and tested to ensure they don’t introduce issues elsewhere.

    You should employ continuous testing to automate the execution of systems tests to identify potential issues, bugs, or other failures as early as possible.

    Continuous delivery automates producing release code from the repository and preparing the final production environment for deployment. Continuous deployment can then make the final release autonomously. Every change to the system that’s passed the previous checks can then be deployed to end-users automatically.

    The goal is to streamline these processes, remove dependency upon human engineers’ time and expertise, and make them robust. This consolidation significantly reduces deployment times, system downtime, and defects, improving the end-user experience.

    3.      Skills & people

    Finally, the right skills are needed. Both teams need to be trained in the basics of the other, so they understand their issues and can communicate. There must be people in the organization with an understanding of both worlds, who can address challenging problems.

    Within these teams, companies also need to ensure the skills exist to deliver AI projects end to end, which includes training in all relevant tools and techniques.

    AI DevOps Done Right

    When performed correctly, IT teams can confidently release well-understood AI into production environments, with the knowledge that they’re robust, trusted, and fit for real-world use. It will increase the success rate of AI models and the speed of AI delivery into the enterprise from months to weeks.

    Converting AI innovations from the lab into highly performing AI solutions in production remains a big challenge. But there is no business value without it.

    Data scientist