Data Science & AI, ML Engineering

Model monitoring: maximize your model performance

Written by
DSL
Published on
October 28, 2024

Machine learning (ML) doesn’t stop at developing a model; that’s just the beginning. Many organizations focus primarily on building a model but forget that the real work only begins once the model is in production. A model is only valuable if it continues to perform well, which requires ongoing maintenance and model monitoring.

MLOps, a combination of machine learning and operations, manages the entire life cycle of a model, with monitoring playing a crucial role. How do you ensure that a model remains reliable over the long term? How do you prevent it from becoming outdated or making inaccurate predictions? This blog explores the importance of monitoring, its benefits, common pitfalls, and provides practical tips for implementing this in Python, so your models continue to run smoothly, even as conditions change.

What is Monitoring?

Monitoring is essential to ensure that machine learning models in production continue to perform reliably. Through various techniques and tools, you can continuously track your model’s performance, detect changes in data, and maintain the accuracy of predictions. But what does this entail exactly?


The four main pillars of ML monitoring are:

  • Model Performance Monitoring
    Monitoring your model’s performance involves measuring crucial metrics such as accuracy, precision, recall, F1-score, and R-squared. By tracking these statistics, you can quickly take action if the model’s performance declines.
  • Data Drift Detection
    Models can become outdated when input data changes over time. Data drift detection helps you identify deviations in the data distribution, allowing you to intervene before model performance truly deteriorates.
  • Operational Monitoring
    In addition to model performance, it’s important to monitor operational functionality. This includes operational metrics such as latency, availability, and error rates. These metrics provide insights into the robustness of the software surrounding your model.
  • Detecting Adversarial Attacks
    Today, attackers sometimes intentionally attempt to mislead models with manipulated data. By using monitoring, you can detect early on when unusual data inputs are trying to manipulate your model, such as images that remain recognizable to humans but are designed to mislead the model.

Monitoring not only helps maintain reliable models but also ensures that you can make immediate adjustments if deviations or issues arise.


The importance of proper metrics

A good monitoring solution does not automatically solve all your problems. As with any step in the MLOps cycle, it is important to make the right tradeoffs. As in the development phase of your model, when monitoring, it is important to choose the right metrics. These metrics should fit your specific case so that you can target and ensure that the model continues to contribute optimally to its objectives.

This applies not only to your model’s performance but also to monitoring data drift. Before diving deeper into drift detection methods, a brief clarification: data drift refers to changes in the distribution of input data, whereas concept drift pertains to changes in the relationship between input and output variables. Both types of drift pose risks, but effective monitoring can help detect and address them. However, it is crucial to identify the root cause of the drift accurately so that you can implement a targeted solution and keep your model robust.

How do you choose the right metrics for drift detection?

Drift detection is all about choosing the right statistical test that fits your data set and problem definition. This requires a thorough knowledge of your data and an informed choice. Here are some important questions to consider:

  • What type of data are you using?
    Is the data categorical or numerical? Numerical data can be better tested with the KS test or MMD, for example, while the Chi-Square and Cramer’s V tests are more suitable for categorical data.
  • How big is your dataset?
    The sensitivity of some tests may vary depending on the size of the dataset.
  • Is your data set normally distributed?
    This may influence the choice of particular tests.
  • Do you want to focus on data drift or concept drift?
    Do you want to monitor changes in input data or the relationship between input and output?
  • Do you want online or offline drift detection?
    Are you working with batch data or streaming data? Do you need to respond to changes in real time?
  • Do you want univariate or multivariate drift detection?
    Do you focus on each feature individually (univariate) or do you want to compare multiple features at once (multivariate)?
  • Which test do you choose for univariate drift?
    Do you use a separate test for each feature or do you choose one test for all features?
  • What dataset are you using for drift detection?
    Do you use the trained dataset as a reference, or do you choose another, unprocessed dataset from the same period?

The choice of the right drift detection method is therefore highly dependent on the objectives and characteristics of your dataset. Monitoring is not a ‘plug-and-play’ solution. It requires careful considerations to arrive at the most suitable solution.

What do you do after drift detection?

Once drift is detected, the next step depends on the type of drift and its cause. It is always wise to conduct regular model evaluations and retrain the model. With a good Experiment Tracking platform, you can make informed decisions about whether to deploy the new model or not.

If retraining does not yield the desired results, a more in-depth analysis may be necessary. This could mean revisiting your feature engineering or the choice of your algorithm. This manual work can be time-consuming, but it is sometimes unavoidable to improve your model.

Automating monitoring and drift detection

Although manual steps are sometimes necessary, much of the monitoring of data and models can be automated. You can schedule drift detections, automatically log results, and set triggers that, for example, start a retrain job as soon as a drift is detected. However, be cautious with too much automation: what happens if a new model is automatically deployed without sufficient oversight? Manual approval may be necessary to avoid risks.

Operational logging

In addition to drift monitoring, it is important to log the operational aspects as well. This means keeping track of your model’s usage, including error messages, latency data, and user data. Operational logging helps identify bugs in the production environment and provides valuable insights for efficient maintenance.

Getting started with monitoring

Now that you know what drift detection entails, you may be wondering: how do I apply it in practice? And rightly so! Because while drift detection sounds logical in theory, practice can be a bit more challenging. Especially when there is a need for real-time (online) drift detection. Here, data are processed immediately as they come in, whereas with offline drift detection, data are analyzed in batches.

Many use cases require online drift detection because batch processing is not always fast enough or feasible. This not only calls for the right tests but also for infrastructure that is robust enough for real-time data processing. This is often why drift detection is only implemented when a project or team has achieved greater maturity in MLOps.

Cloud solutions for drift detection

Cloud providers are responding to the need for drift detection and operational logging. For example, Azure offers an integrated Data Drift solution on its Azure Machine Learning platform, making it significantly easier to monitor data drift. However, cloud solutions are particularly useful in the area of operational logging.

Amazon CloudWatch (AWS) provides detailed insights into the operational performance of your ML models, such as error reports and latency data.

Azure Monitor offers similar functionality, including Log Analytics and Application Insights, which allow you to maintain centralized logs and set notifications for important events in your production code.

Open-source tools for drift detection

In addition to these cloud solutions, there are also powerful open-source tools available. For drift detection in Python, there are three popular PIP packages you can use:

  • Evidently AI
    This user-friendly tool focuses specifically on tabular data and supports various drift detection methods.
  • Alibi-detect
    A flexible and well-documented option that supports tabular data, images and text. Moreover, Alibi-detect provides an additional component for adversarial attack detection, increasing the security of your models.
  • NannyML
    Like Evidently, this tool is primarily suitable for tabular data and offers easy integration. It is slightly more user-friendly, but limited in its support for other data types.

While all three tools are useful, each has its own advantages and disadvantages. Whereas Evidently and NannyML are simpler to use, Alibi-detect is more comprehensive, but more complex. Combining multiple tools can often provide the best solution for a robust drift monitoring system.

Our solution at Data Science Lab

At DSL, we chose to combine several of these open-source tools to provide a generic and easy-to-use drift monitoring solution. By using the best of each package, we can both easily monitor tabular data and detect more complex drift in images or text. This approach provides flexibility and scalability for a variety of use cases.

What are the pros and cons of monitoring?

As previously mentioned, an ML model, just like the software it runs on, must be maintained as if it were a “living” system. Models need to regularly adapt to their environment to continue performing optimally. However, this maintenance comes with both benefits and challenges.

4o mini

Benefits of monitoring

  1. Improved model performance
    Continuous model monitoring allows for early detection of performance deviations. This enables quick adjustments, ensuring that the model can always perform at its best.
  1. Increased reliability
    Monitoring ensures that the integrity of the model is maintained. This increases confidence in the decisions made by the model, essential for organizations that rely on AI solutions.
  1. Faster response to changes
    Thanks to monitoring, organizations can proactively respond to changes in data or the behavior of the model. Instead of solving problems reactively, you can take steps in advance, leading to more efficient operations.

Challenges and pitfalls

  1. Applicability in Practice
    Although monitoring sounds simple in theory, it can be challenging in practice. It requires a robust infrastructure for real-time data processing and careful selection of monitoring tests. Setting up an effective monitoring solution can be particularly challenging for organizations that lack the right expertise or resources.
  1. Amount of data
    Keeping track of too many metrics can be overwhelming. Without a clear strategy, it can become difficult to gain meaningful insights from the abundance of data, making the monitoring process unnecessarily complex.
  1. Hypersensitivity to small changes
    When a monitoring solution is too sensitive, it can lead to overreactions to minor changes in data. This can waste unnecessary resources and time, as frequent retraining, investigations, or developments are initiated without being truly necessary.

Conclusion

Monitoring is an essential part of successfully implementing and maintaining a mature MLOps cycle. By paying attention to the right metrics, drift detection, and automation, organizations can optimize the performance of their models and ensure that they remain reliable, even in a rapidly changing environment. Tools such as Evidently AI, Alibi Detect, NannyML, and Azure and AWS monitoring options can be of great value here.

However, there are also real-world challenges. Organizations must reach a certain maturity level and ensure a robust infrastructure to deploy monitoring effectively. But the benefits, such as improved performance, increased reliability and rapid response, are clear and compelling. Being aware of the potential pitfalls is crucial to making implementation successful.

Ready for the next step?

Want to learn more about how your organization can benefit from effective model monitoring and drift detection? Contact us. We’d love to help you take your MLOps infrastructure to the next level and get the most out of your AI solutions. That’s what MLOps does

Questions? Please contact us

Blog

This is also interesting

Lorem ipsum dolor sit amet, consectetur adipiscing elit.

AI kerstkaart persoonlijk

You know the drill: Christmas is coming and again you’re too late to send Christmas cards. Meanwhile, your parents, Aunt Jannie and…

Churn reduceren

Recognize it? As an organization with subscription services, reducing churn is probably on the agenda. Not the most popular topic, because we…

What are the possibilities of GenAI, Large Language Models (LLMs) for the internal organization? How to implement an LLM effectively for organizations….

Sign up for our newsletter

Do you want to be the first to know about a new blog?