top of page

MLOps: Continuous Delivery for Machine Learning on AWS

Updated: Jan 20, 2022

In this article:

In modern software development, continuous delivery (CD) principles and practices have significantly improved the throughput of delivering software to production in a safe, continuous, and reliable way and helped to avoid big, disruptive, and error prone deployments.


After machine learning (ML) techniques showed that they can provide significant value, organizations started to get serious about using these new technologies and tried to get them deployed to production. However, people soon realized that training and running a machine learning model on a laptop is completely different than running it in a production IT environment. A common problem is having models that only work in a lab environment and never leave the proof-of-concept phase. Nucleus Research published a 2019 report, where they analyzed 316 AI projects in companies ranging from 20-person startups to Fortune 100 global enterprises. They found that only 38% of AI projects made it to production. Further, projects that made it to production did so in a manual adhoc way,often then becoming stale and hard to update.


Creating a process to operationalize machine learning systems enables organizations to leverage the new and endless opportunities of machine learning to optimize processes and products. However,italso bringsnew challenges. Using ML models in software development makes it difficult to achieve versioning, quality control, reliability, reproducibility, explainability, and audibility in that process. This happens because there are a higher number of changing artifacts to be managed in addition to the software code, such as the datasets,the machine learning models, and the parameters and hyperparameters used by such models. And the size and portability of such artifacts can be orders of magnitude higher than the software code.


There are also organizational challenges. Different teams might own different parts of the process and have their own ways of working. Data engineers might be building pipelines to make data accessible, while data scientists can be researching and exploring better models. Machine learning engineers or developers then have to worry about how to integrate that model and release it to production. When these groups work in separate silos, there is a high risk of creating friction in the process and delivering suboptimal results.


Figure 1: The different personas usually involved in Machine Learning projects.


MLOps extends DevOps into the machine learning space. It refers to the culture where people, regardless of their title or background, work together to imagine, develop, deploy, operate, and improve a machine learning system. In order to tackle the described challenges in bringing ML to production,ThoughtWorks has developed continuous delivery for machine learning (CD4ML), an approach to realize MLOps. In one of their first ML projects, ThoughtWorks built a price recommendation engine with CD4MLon AWS for AutoScout24, the largest online car marketplace in Europe. Today, CD4MLis standard at ThoughtWorks for ML projects.


Continuous delivery for machine learning


Continuous delivery has been the approach to bring automation, quality, and discipline to create a reliable and repeatable process of releasing software into production. Jez Humble, one of the authors of the seminal book Continuous Delivery, states that:

Continuous Delivery is the ability to get changes of all types—including new features, configuration changes, bug fixes, and experiments—into production, or into the hands of users, safely and quickly in a sustainable way

Continuous delivery applies to changes of all types, not just software code. With that in mind, we can extend its definition to incorporate the new elements and challenges that exist in real-world machine learning systems, an approach we are calling Continuous Delivery for Machine Learning

Continuous Delivery for Machine Learning (CD4ML) is a software engineering approach in which a cross-functional team produces machine learning applications based on code, data, and models in small and safe increments that can be reproduced and reliably released at any time, in short adaptation cycles.

This definition includes the core principles to strive for. It highlights the importance of cross-functional teams with skill sets across different areas of specialization, such as: data engineering, data science, or operations. It incorporates other sources of change beyond code and configuration, such as datasets, models, and parameters. It calls for an incremental and reliable process to make small changes frequently, in a safe way, which reduces the risk of big releases. Finally, it requires a feedback loop: The real-world data is continuously changing and the models in productions are continuously monitored, leading to adaptations and improvements by re-training of the models and the re-iteration of the whole process.


The different process steps of CD4ML


Figure 2shows the end-to-end process of CD4ML. Let's have a closer look at the different steps.


Figure 2: Continuous delivery for machine learning end-to-end process

Model bilding


Once the need for a machine learning system is found, data scientists research and experiment to develop the best model, by trying different combinations of algorithms, and tuning their parameters and hyperparameters. This producesmodels that can be evaluated to assess the quality of its predictions. The formal implementation of this model training process becomes the machine learning pipeline.


Having an automated and reproducible machine learning pipeline allows other data scientists to collaborate on the same code base, but also allows it to be executed in different environments, against different datasets. This provides great flexibility to scale out and track the execution of multiple experiments and ensures their reproducibility.


Model evaluation and experimentation


As the data science process is very research-oriented, it is common that there will be multiple experiments running in parallel, and many of them will never make their way to production. This requires an approach that keeps track of all the different experiments, different code versions, and potentially different datasets and parameters/hyper-parameters attempted. Your CD4ML architecture needsto support tracking, visualizing, comparing results from different runs, as well as to support the graduation and promotion of models that prove to be useful.


Production-ize the model


Once a suitable model is found, you need to decide how it will be served and used in production. It might be embedded and packaged within the application that consumes it, it might be deployed as a separate service, or it might be treated as data that is loaded at runtime.


Regardless of which pattern you choose, there will always be an implicit contract (API) between the model and how it is consumed. If the contract changes, it will cause an integration bug.


Testing and quality


Machine learning workflows require different types of testing. Some are inherently non-deterministic and hard to automate, while others can be automated to add value and improve the overall quality of the system. Examples include data validation and data quality, component integration and contract testing, evaluating model quality metrics, and validating the model against bias and fairness.


While you cannot write a deterministic test to assert the model score from a given training run, the CD4ML process can automate the collection of such metrics and track their trend over time. This allows you to introduce quality gates that fail when they cross a configurable threshold and to ensure that models don’t degrade against known performance baselines.


Deployment


Once a good candidate model is found, it must be deployed to production. There are different approaches to do that with minimal disruption. You can have multiple models performing the same task for different partitions of the problem. You can have a shadow model deployed sidebyside with the current one to monitor its performance before promoting it. You can have competing models being actively used by different segments of the user base. Or you can have online learning models that are continuously improving with the arrival of new data.


Elastic cloud infrastructure is a key enabler for implementing these different deployment scenarios while minimizing any potential downtime, allowing you to scale the infrastructure up and down on-demand, as they are rolled out.


Monitoring and observability and closing the feedback loop


Once the model is live, you need the monitoring and observability infrastructure to understand how it is performing in production against real data. By capturing this data, you can close the data feedback loop. Ahuman in the loop can analyze the new data captured from production, curate,and label it to create new training datasets for improving future models. This enables models to adapt and creates a process of continuous improvement.

The technical components of CD4ML

To be able to perform these process steps in an automated and continuous way, CD4ML requires the right infrastructure and tooling. The following image shows the components of the CD4ML infrastructure.


Figure 3: Technical components of CD4ML

Discoverable and accessible data


Good machine learning models require good data. And data must be easily discoverable and accessible. Data can be generated and collected from inside or outside your organization. There are several approaches to building a modern data platform, but it must be architected to consider the needs of different stakeholder groups: From application teams building systems that generate or collect data, to teams responsible for data governance, to business stakeholders that use such data for decision-making, and to data scientists who automate it through an AI/ML system.


Version control and artifact repositories


Like in modern software engineering projects, version control tools are mandatory for working efficiently as a team of people and ensuring reproducibility of the results. Machine learning projects bring the additional complexity of data as a new artifact.


Continuous delivery orchestration


To tie everything together, a good continuous delivery orchestration tool allowsyou to model the intricate dependencies and workflows to implement the end-to-end CD4ML process. They provide means to automate the execution of deployment pipelines, and to model the data governance process required to promote models from research to production.


Infrastructure for multiple environments


While not a hard requirement, using cloud infrastructure bringsmany benefits to CD4ML:

  • Elasticity and scalability to use more or less resources as needed

  • The ability to use specialized hardware (such as GPUs) for training machine learning models more efficiently

  • Access to specialized services to manage your data and machine learning platforms

  • The ability to use a “pay-as-you-go” model based on actual usage of the underlying infrastructure


Model performance assessment tracking


Because data science and machine learning development can be highly experimental and repetitive, it is important to havethe possibility to track the different models, their parameters,and results in a way that is transparent to the whole development team.


Model monitoring and observability


Machine learning models in production have to be continuously monitored to answer questions like: How does the model perform? What is the actual data fed into the model? Doesthe data havea different characteristic than the training set? Is the performance deteriorating? Is there bias in the data or model performance?

126 views0 comments
Stationary photo

Be the first to know

Subscribe to our newsletter to receive news and updates.

Thanks for submitting!

Follow us
bottom of page