Train Test Prediction Drift#
This notebook provides an overview for using and understanding the tabular prediction drift check.
What Is Prediction Drift?#
Drift is simply a change in the distribution of data over time, and it is also one of the top reasons why machine learning model’s performance degrades over time.
Prediction drift is when drift occurs in the prediction itself. Calculating prediction drift is especially useful in cases in which labels are not available for the test dataset, and so a drift in the predictions is our only indication that a changed has happened in the data that actually affects model predictions. If labels are available, it’s also recommended to run the Label Drift check.
For more information on drift, please visit our drift guide.
How Deepchecks Detects Prediction Drift#
This check detects prediction drift by using univariate measures on the prediction output.
from sklearn.preprocessing import LabelEncoder from deepchecks.tabular.checks import TrainTestPredictionDrift from deepchecks.tabular.datasets.classification import adult
from sklearn.compose import ColumnTransformer from sklearn.ensemble import RandomForestClassifier from sklearn.impute import SimpleImputer from sklearn.pipeline import Pipeline from sklearn.preprocessing import OrdinalEncoder
numeric_transformer = SimpleImputer() categorical_transformer = Pipeline( steps=[("imputer", SimpleImputer(strategy="most_frequent")), ("encoder", OrdinalEncoder())] ) train_ds.features preprocessor = ColumnTransformer( transformers=[ ("num", numeric_transformer, train_ds.numerical_features), ("cat", categorical_transformer, train_ds.cat_features), ] ) model = Pipeline(steps=[("preprocessing", preprocessor), ("model", RandomForestClassifier(max_depth=5, n_jobs=-1))]) model = model.fit(train_ds.data[train_ds.features], train_ds.data[train_ds.label_name])
The prediction drift check can also calculate drift on the predicted classes rather than the probabilities. This is
the default behavior for multiclass tasks. To force this behavior for binary tasks, set the
Total running time of the script: ( 0 minutes 5.769 seconds)