[Paper Review] Machine Learning Operations: Overview, Definition, and Architecture
- The final goal of all the industrial machine learning projects is to develop ML products and then bring them into production. But it is highly challenging to automate and operationalize ML products.
- This paper provides a guide for ML researchers and practitioners who want to automate and operate ML products with a designed set of technologies.
- A Large number of ML products fail just because of the production work.
- From the research point of view, this was not a surprise for all of us as most of the ML work has been done on making good ML models rather than
- Focusing on production-ready ML products.
- Proving the necessary coordination of the resulting often complex ML system components and infrastructure including the roles required to automate and operate an ML system in a real-world setting.
- For instance, in most places, Data Scientists still manage the ML workflows to a great extent, resulting in many issues during the respective ML solutions.
- While researchers shed some light on various specific aspects of MLOPs, some are still missing like:-
- Holistic conceptualization
- Clarification of ML systems designs
Foundations of DevOps
- DevOps is more than a pure methodology and rather represents a paradigm addressing social and technical issues in organizations engaged in software development.
- It has a goal of eliminating the gap between development and operations and emphasizes collaboration, communication, and knowledge sharing.
- It ensures automation with continuous integration and continuous delivery(CI/CD). Moreover, it is designed for continuous testing, quality assurance, continuous monitoring, logging, and feedback loops.
- Cloud platforms are equipped with ready-to-use DevOps tooling that is designed for cloud use.
- Empirical results show that DevOps ensures better software quality.
To derive insights from the academic knowledge base while also drawing upon the expertise of the practitioners from the field, we apply a mixed-method approach. As the first step of this mixed method approach, we firstly conducted a Literature review to obtain an overview of relevant research. After this, we reviewed tooling support to gain technical knowledge in the field. In the end, we did Interview Study by taking 8 interviews with experts from different domains.
- For this paper, they followed Webster and Watson, and Kitchenham's method for review.
- After the initial search they used terms like (((”DevOps” OR “CICD” OR “Continuous Integration” OR “Continuous Delivery” OR “Continuous Deployment”) AND “Machine Learning”) OR “MLOps” OR “CD4ML”). They use this as a query over the databases like Google Scholar, Web of Science, etc.
- After the search in May 2021 they got around 1864 articles. Out of those they screened 194 papers in total and from that they got around 27 articles that suit what they were searching for.
- In this they reviewed various open source tools, frameworks and commercial cloud ML services to gain technical domain knowledge.
- To gain insights from various perspectives, we choose interview partners from different organizations and industries, different countries and nationalities. as well as different genders.
- In total, they conducted around 8 interviews.
With the help of this methodology, we get principles, components, roles, and architecture. Finally, we derive the conceptualization of the term and provide a definition of the MLOps.
Now let’s discuss all these four results briefly:-
- It is simply the guide to how things should be realized in MLOps or in simple words we can say best practices that we have to follow.
We have a total of 9 principles:-
- CI/CD automation
- Workflow orchestration
- Continuous ML training and evaluation
- ML Metadata tracking/logging
- Continuous Monitoring
- Feedback Loops
- After identifying the principles that need to be incorporated into MLOps, we now elaborate on the precise component and implement them into ML system design.
- We have around 9 ML Components:-
- CI/CD Components (P1, P6, P9)
- Source Code Repository(P4, P5)
- Workflow Orchestration(P2,P3,P6)
- Feature Store System(P3, P4)
- Model Training Infrastructure(P6)
- Model Registry(P3,P4)
- ML Metadata Stores(P4,P7)
- Model Serving Component(P1)
- Monitoring Component(P8, P9)
- Business Stakeholder
- Solution Architect
- Data Scientist
- Data Engineer
- Software Engineer
- DevOps Engineer
- ML Engineer/ MLOps Engineer
Architecture and Workflow
- On the basis of principles, components, and roles they have designed an end-to-end architecture and workflow for ml researchers and practitioners.
- The artifact was designed to be technology-agnostic. Therefore, ML researchers and practitioners can choose the best fitting technologies and frameworks for their needs
- MLOps project intuition
- Requirements for feature engineering pipeline
- Feature engineering pipeline
- Automated ML Workflow Pipeline
MLOps is like DevOps but for models. The task is too big for a single person to handle, so you will need different specialists as your models develop. Start with planning, understanding, and initial data analysis (basically everything normally expected of a data scientist).
Build a pipeline for feature engineering and start experimenting with models, storing metadata on the models as you go. When the meta-training loop is ready to go, push the model to an automated ML workflow pipeline and continuously monitor for concept drift as you serve predictions. The paper doesn’t make this clear, but you wouldn’t build all this infrastructure at the same time. You would likely start by automating things like model versioning etc. and get more sophisticated as your needs evolve.