[Paper Review] Machine Learning Operations: Overview, Definition, and Architecture

Mohit Mishra

5 min readAug 23, 2022

Abstract

The final goal of all the industrial machine learning projects is to develop ML products and then bring them into production. But it is highly challenging to automate and operationalize ML products.
This paper provides a guide for ML researchers and practitioners who want to automate and operate ML products with a designed set of technologies.

Introduction

A Large number of ML products fail just because of the production work.
From the research point of view, this was not a surprise for all of us as most of the ML work has been done on making good ML models rather than
Focusing on production-ready ML products.
Proving the necessary coordination of the resulting often complex ML system components and infrastructure including the roles required to automate and operate an ML system in a real-world setting.
For instance, in most places, Data Scientists still manage the ML workflows to a great extent, resulting in many issues during the respective ML solutions.
While researchers shed some light on various specific aspects of MLOPs, some are still missing like:-
Holistic conceptualization
Generalization
Clarification of ML systems designs

Foundations of DevOps

DevOps is more than a pure methodology and rather represents a paradigm addressing social and technical issues in organizations engaged in software development.
It has a goal of eliminating the gap between development and operations and emphasizes collaboration, communication, and knowledge sharing.
It ensures automation with continuous integration and continuous delivery(CI/CD). Moreover, it is designed for continuous testing, quality assurance, continuous monitoring, logging, and feedback loops.
Cloud platforms are equipped with ready-to-use DevOps tooling that is designed for cloud use.
Empirical results show that DevOps ensures better software quality.

Methodology

To derive insights from the academic knowledge base while also drawing upon the expertise of the practitioners from the field, we apply a mixed-method approach. As the first step of this mixed method approach, we firstly conducted a Literature review to obtain an overview of relevant research. After this, we reviewed tooling support to gain technical knowledge in the field. In the end, we did Interview Study by taking 8 interviews with experts from different domains.

Literature Review

For this paper, they followed Webster and Watson, and Kitchenham's method for review.
After the initial search they used terms like (((”DevOps” OR “CICD” OR “Continuous Integration” OR “Continuous Delivery” OR “Continuous Deployment”) AND “Machine Learning”) OR “MLOps” OR “CD4ML”). They use this as a query over the databases like Google Scholar, Web of Science, etc.
After the search in May 2021 they got around 1864 articles. Out of those they screened 194 papers in total and from that they got around 27 articles that suit what they were searching for.

Tool Review

In this they reviewed various open source tools, frameworks and commercial cloud ML services to gain technical domain knowledge.

Interview Study

To gain insights from various perspectives, we choose interview partners from different organizations and industries, different countries and nationalities. as well as different genders.
In total, they conducted around 8 interviews.

Results

With the help of this methodology, we get principles, components, roles, and architecture. Finally, we derive the conceptualization of the term and provide a definition of the MLOps.

Now let’s discuss all these four results briefly:-

Principles

It is simply the guide to how things should be realized in MLOps or in simple words we can say best practices that we have to follow.

We have a total of 9 principles:-

CI/CD automation
Workflow orchestration
Reproducibility
Versioning
Collaboration
Continuous ML training and evaluation
ML Metadata tracking/logging
Continuous Monitoring
Feedback Loops

Components

After identifying the principles that need to be incorporated into MLOps, we now elaborate on the precise component and implement them into ML system design.
We have around 9 ML Components:-
CI/CD Components (P1, P6, P9)
Source Code Repository(P4, P5)
Workflow Orchestration(P2,P3,P6)
Feature Store System(P3, P4)
Model Training Infrastructure(P6)
Model Registry(P3,P4)
ML Metadata Stores(P4,P7)
Model Serving Component(P1)
Monitoring Component(P8, P9)

Roles

Business Stakeholder
Solution Architect
Data Scientist
Data Engineer
Software Engineer
DevOps Engineer
ML Engineer/ MLOps Engineer

Architecture and Workflow

On the basis of principles, components, and roles they have designed an end-to-end architecture and workflow for ml researchers and practitioners.
The artifact was designed to be technology-agnostic. Therefore, ML researchers and practitioners can choose the best fitting technologies and frameworks for their needs

MLOps project intuition
Requirements for feature engineering pipeline
Feature engineering pipeline
Experimentation
Automated ML Workflow Pipeline

Summary

MLOps is like DevOps but for models. The task is too big for a single person to handle, so you will need different specialists as your models develop. Start with planning, understanding, and initial data analysis (basically everything normally expected of a data scientist).

Build a pipeline for feature engineering and start experimenting with models, storing metadata on the models as you go. When the meta-training loop is ready to go, push the model to an automated ML workflow pipeline and continuously monitor for concept drift as you serve predictions. The paper doesn’t make this clear, but you wouldn’t build all this infrastructure at the same time. You would likely start by automating things like model versioning etc. and get more sophisticated as your needs evolve.