Deploying Machine Learning Models

Deploying Machine Learnng Models

Deploying Machine Learning Models


Deploying machine learning models into production is a separate step of the Machine Learning pipeline. It comes after the research phase, covered with analyzing data, feature engineering, and feature selection.

The Machine Learning model shows its full value when the company that produced it is generating predictions using fully integrated API and feed with real live input data. 

Machine Model deployment is a process of integrating it with the real production environment to generate predictions.

Taking an ML Model From Desktop POC to Running in Production Implies a Massive, Continuous Effort

Failures of putting the model into productions are one of the most frequent reasons why machine learning projects fail.

It is essential to know that when we deploy a Machine Learning model, we don’t deploy only the trained model but the whole Machine Learning pipeline, starting with feature engineering.


Architecture Goals

Training machine learning models and putting them into production is relatively easy and cheap. The cost of maintaining them properly into production is much harder. All traditional software maintenance issues remain present in Machine Learning systems. But there is an additional set of machine learning related problems that should we should solve with proper architecture:

  1. Accuracy of the machine learning models needs to be reproducible
  2. Parameters of the input data for training the model can vary over time and be different from those used during the initial data analysis phase
  3. Hard detection of possible errors in the models using traditional methods for testing software

Architecture Types

There are several types of architectures that serve predictions using a trained model. Serving through REST API and online learning are some of them. Architecture type depends on the business needs and technical specifications of the platform where we train the model and generate predictions.

Building Reproducible Machine Learning Pipeline

Building Machine Learning pipeline

We can implement the machine learning pipeline in several ways. They can be classified into procedural programming using functions for each step in the pipeline, writing our pipeline code using OOP, and using third-party pipeline implementations.

The best choice is to use some third-party pipeline implementation, such as the one existing in SciKit-Learn Python library:

Now, to build a reproducible Machine Learning pipeline means that we can reproduce the results of each step in the pipeline when we execute it again.

Let’s see the list of the steps executed to build a machine learning project:

  1. Collecting data
  2. Data analysis (which features are categorical and which are continues)
  3. Feature engineering
  4. Feature selection
  5. Training the machine learning model, and
  6. Model deployment

Ensuring Reproducibility

To ensure the reproducibility of the whole pipeline, we need to ensure it in all of the steps. We can exclude step 2, which is done once in the experimental phase.

One of the biggest challenges in the building models with reproducible results is to reproduce the data collection phase when we extract training data. When we gather data from DB, next time we select data from the DB, we can obtain modified results. One solution for this is saving snapshots of the training data and use them in repetitive model deployments. That is a solution that doesn’t need changes in the DB schema but can be hard for implementation because of the storage limitations. Another solution is to add a timestamp column in the tables that we use for extracting training data. This solution is better from the storage capacity point of view. But on the other hand, adding a new timestamp column into already existing DB can be impossible. It also requires modifications in the software that saves the records into DB.

Additional resources about the topic

Testing the Machine Learning Pipeline

An essential part of machine learning pipeline implementation is testing it. Industry-standard for testing Python code is pytest library.

A useful extension to pytest is tox tool. It is a Python virtual environment manager that allows us to execute tests in different Python setups. Also

tox aims to automate and standardize testing in Python. It is part of a larger vision of easing the packaging, testing, and release process of Python software.

We are not limited to use it only to execute tests in different environments because it can run any other command.

Implementing Differential Tests

In the typical software development scenarios, developers maybe neglect differential tests, but in Machine Learning, they play a crucial role.

They perform a comparison in the performances of the previous model version and the new one, having the same input starting with training data and including predictions for the same feature values. They can prevent painful mistakes that were unseen during a long period of testing and production. We can implement them using pytest library as a built-in feature.

Testing New Models Before Releasing Into Production

Before we put some new release of trained Machine Learning model officially in the production environment, we need to test it for an extended period (speaking about a few months), before putting it officially into production. That is also known as Shadow Deployment or Dark Launch (named by Google).

There are two possibilities to implement Shadow Deployments.

On application level

The first one is at the application level. Software that serves predictions from the currently released production version also generates predictions from the Shadow deployed model version and stores them into the logs or DB. Then we compare both predictions to track possible anomalies in the new version of the model.

On infrastructre level

The second approach is at the infrastructure level. In this approach, we configure a load balancer to receive client requests and send them to two different environments: one that is currently into production and serves customers with prediction, and one with the new model that only stores predictions in the logs or DB. This approach involves more effort from DevOps and doubling the load of the system.

Additional resources

How to Deal With Randomness in Machine Learning Pipeline

Traditional machine learning models

Training Machine Learning models have inherited randomness. In most of the training algorithms initial state is chosen randomly, and then the model converges to its local optima. Since it is only local and not a global optima (finding global optima is currently far away from solution), different local optima can be discovered with different initial model values. That means that found local optima can vary for different randomly chosen initial values. In other words, even repetitive training with the same training data and model hyperparameters can finalize with models that have slightly different predictions.

We can avoid these situations can be avoided if we set the seed whenever we expect that random numbers before each random number generation.

from numpy.random import seed

Neural Networks and Deep Learning

Random number generation problem is on a much larger scale when we are working with Neural Networks and Deep Learning. There we have many more parameters that we need to train and are chosen randomly in their initial values. 

// TensorFlow:
from tensorflow import set_random_seed

// PyTorch:
import torch

Deep Learning libraries use CUDA and cuDNN modules to maximize their performance running on GPU. But both generate needed random numbers on their own, so we must seed them as well.

// PyTorch:
SEED = 0
torch.backends.cudnn.deterministic = True

Deployment Methods

Generally speaking, the preferred method for serving Machine Learning models is via Docker images. They are easy to be versioned, deployed, and orchestrated in various ways (Swarm, Kubernetes). Building such Doker images can be part of the Machine Learning pipeline.

Serving the Model

Serving the model with REST API

REST API is the most frequently used way to expose the trained model that generates predictions. There are a lot of Python libraries that help in exposing endpoints that generate predictions. The most used one is Flask library.

Serving Deep Learning Models with Tensorflow Serving and TorchServe

The most popular Deep Learning libraries these days are TensorFlow and PyTorch. Both offer high-performance model serving engines:

  • TensorFlow serving. Exists since TensorFlow version 1 and is a mature product.
  • TorchServe. This one comes with PyTorch 1.5 that is released soon and is still in the experimental phase.

Again, deploying such models is preferred via Docker images. There are already prepared images that we can use for storing models inside and putting into production.


Talking about Machine Learning pipeline, a new kid on the block is Kuberflow

The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable, and scalable. Our goal is not to recreate other services, but to provide a straightforward way to deploy best-of-breed open-source systems for ML to diverse infrastructures. Anywhere you are running Kubernetes, you should be able to run Kubeflow.

That is a new tool that hasn’t gather public interest yet but is worth to be considered when planning the Machine Learning pipeline and deployments.


That should be all for feature selection. What Comes Next in Machine Learning Pipeline is testing and monitoring Machine Learning model deployments, and this topic will be covered in another post that follows. extensively covers most of the mentioned topics, and we highly recommend it for all interested in the implementation of the full Machine Learning pipeline.


Machine Learning

Deploying AI modelsDeploying Deep Learning modelsDeploying Machine Learning modelsPyTorchtensorflow

Leave a Reply

Your email address will not be published. Required fields are marked *

four × one =