Few of us will go, “Houston, we have a problem,” when we see a model drift occur. We got so accustomed to model drifts that we see it as another day, another model drift.
In this blog, we’ll cover some of the aspects of model drift, like, what machine learning model drift is?, the types of ML model drift, model drift examples, the importance of model drift detection, and the best practices in dealing with model drift.
We live in a world where nothing is constant, especially data. Yet, everyday data scientists deal with tons of data that lead to model drift, model performance degradation, and eventually failures in accurate prediction.
Model monitoring enables us to detect model drifts as and when they occur to take the necessary steps to retrain our model based on the drifts. This process of detection and retraining is iterative; whole MLOps is an iterative process, making it unique compared to DevOps.
Model Monitoring: It is an operational step in the pipeline, where post-deployment, we track the performance of our model in action to pinpoint bugs and unwanted behaviors.
By definition, model drift refers to the performance degradation of a model in production due to the changes in the data and the relationship between the variables. Therefore, it is a given that model drifts affect the organization negatively over time or immediately.
For example, an e-commerce company conducting experiments to detect fraudulent products and stores could lead to false predictions as the number of stores, users, and products keep increasing. With an increasing number of new users comes new data and data drift, otherwise known as model drift.
It essentially means that the connection between the target variable and the independent variables changes with time. Thanks to this drift, the model keeps becoming unstable; therefore, the predictions keep evolving inaccurately with time.
It is often due to errors in data collection, people’s behavior changes, or time gaps that alter what is considered a good prediction and what is not.
As more machine learning models are deployed in live production for real-world applications, model drift has become a severe issue. In the significant data era, it’s not fair to expect data distributions to stay stable over a long period. It suggests that data drift is a significant problem for data science teams trying to scale ML efforts.
Drift in machine learning models occurs for several reasons, but there are generally two main categories: insufficient training data and changing environments. Lousy training data is data that does not accurately represent real-world situations and is also known as flawed training data.
A dynamic environment is often the case for instability, where the change in the data is never within our control. Some examples are:
In any model deployment, the model inevitably changes the environment in which it is deployed. For example, predicting the churn rate of a customer based on the historical database results in predicting when a customer is going to churn. However, if the customer success team retains the customer, then the dataset will now contain a concept drift because the said user predicted to churn as now continued to stay on.
Model drift is classified into two broad categories. The essential type is called ‘concept drift.’ It happens when the properties of the target variable change. So, if the foundation of the variable we try to predict changes, the model will not work well.
The second, most common type is ‘data drift.’ It happens when the properties of the predictors change. Again, if the underlying variables change, the model will fail.
A classic example of when this might happen is when the patterns within the data change due to seasonality, like in our e-commerce store. For example, the model works during the spring sale but will not function during the Christmas sale. Similarly, with flight demand surges during the vacation seasons, airlines struggle to take care of occupancy in the off-season.
Any changes in the data’s statistical properties can be called concept drift or model drift. Mathematically it indicates the shift in the relationship between the predictors and the target variable. By definition, concept drift is “a change in the joint probability distribution, i.e., Pt(X,y) ≠ Pt+(X,y).”
Covariate shift occurs when a change occurs in the distribution of the input variable. For instance, people’s shopping patterns evolve, leading to data source changes or sensors becoming inaccurate over time. It also can be seen on a univariate level, also known as feature drift, or on a multivariate across the distribution.
Label drift, also referred to as prior probability shift or unconditional class shift, occurs when change occurs in the distribution of the class variable. For instance, fraud detection in our e-commerce store example where the size of the spam products or accounts changes drastically over time, such as selling Jordans for $50,000.
Real concept drift, also known as posterior class shift, appears when the relationship between the input variable and the target variable is changed. For instance, search results for Corona returned with information related to the virus, not Mexican beer. Perhaps the hardest to detect is as much of this drift occurs due to natural processes.
The model detection framework above shows us the framework we can use to detect model drift. In the Data Retrieval stage, we can retrieve large amounts of data in bulk since a single data point cannot infer the overall distribution.
In the data modeling stage, we can extract key features that can impact the model in case they drift. The statistics calculation stage is where we measure the drift and test results for the hypothesis test.
The hypothesis test stage is where we evaluate the importance of the change in the statistics calculations stage.
Some methods to detect model drift:
The Kolmogorov-Smirnov test, or the K-S test, is a nonparametric test that compares the training and post-training data. The null hypothesis for this test states that if the data distributions from both datasets are the same, and if the null is rejected, we can confirm the presence of model drift.
The population stability Index (PSI) compares the distribution of the target variable in the test with the training dataset used to develop the model.
A model-based approach can detect model drift in machine learning between two populations. For example, if we label our data in the current model to 0, then the real-time data can be labeled as 1. Now we can build a model to evaluate this, such as if the model produces high accuracy, we can confirm a drift has occurred.
The adaptive Windowing (ADWIN) algorithm adopts a sliding window approach to detect model drift. ADWIN slides a fixed-size window to detect any change in the new data, and a threshold is set to trigger a warning if a model drift is detected.
There are several best practices when dealing with model drift; let us review a few.
The best possible action you can take regarding a static model is to do nothing. It allows you to develop one model to rule all the future data. Assuming this as a starting point, we can baseline the model and use it to compare future models.
Apart from doing nothing, the next level of intervention is to update your model with more recent data periodically. But, again, it involves much back-testing to decide on the amount of data that might be appropriate for the model.
You can also use weights to detect model drifts; for example, we can use weighting inversely proportional to the age of the data, and based on that, we can pay more attention to more recent data.
We can train a model to learn the correct predictions from the current model based on the relationships between variables in the recent data. The key is to fit the subsequent models on different and more recent data instead of the weighted form of the same dataset.
Use model performance monitoring tools such as NimbleBox.ai to monitor your models 24/7 and set triggers in case of model drifts. It allows you to act immediately without wasting time and energy on unwanted model drifts.
Model drifts are not meant to be avoided; they allow us to better what we created in the first place. However, in a field like machine learning, where every day new advancement is made, the deploy and forget policy will get us nowhere. There are many ways to monitor, detect drifts, and update models. You need to find what works best for you.