.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "checks_gallery/tabular/model_evaluation/plot_train_test_prediction_drift.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_checks_gallery_tabular_model_evaluation_plot_train_test_prediction_drift.py: .. _plot_tabular_train_test_prediction_drift: Train Test Prediction Drift *************************** This notebook provides an overview for using and understanding the tabular prediction drift check. **Structure:** * `What Is Prediction Drift? <#what-is-prediction-drift>`__ * `Generate Data <#generate-data>`__ * `Build Model <#build-model>`__ * `Run check <#run-check>`__ What Is Prediction Drift? ========================= Drift is simply a change in the distribution of data over time, and it is also one of the top reasons why machine learning model's performance degrades over time. Prediction drift is when drift occurs in the prediction itself. Calculating prediction drift is especially useful in cases in which labels are not available for the test dataset, and so a drift in the predictions is our only indication that a changed has happened in the data that actually affects model predictions. If labels are available, it's also recommended to run the :doc:`Label Drift check `. For more information on drift, please visit our :doc:`drift guide `. How Deepchecks Detects Prediction Drift --------------------------------------- This check detects prediction drift by using :ref:`univariate measures ` on the prediction output. .. GENERATED FROM PYTHON SOURCE LINES 41-48 .. code-block:: default from sklearn.preprocessing import LabelEncoder from deepchecks.tabular.checks import TrainTestPredictionDrift from deepchecks.tabular.datasets.classification import adult .. GENERATED FROM PYTHON SOURCE LINES 49-51 Generate data ============= .. GENERATED FROM PYTHON SOURCE LINES 51-55 .. code-block:: default label_name = 'income' train_ds, test_ds = adult.load_data() .. GENERATED FROM PYTHON SOURCE LINES 56-57 Introducing drift: .. GENERATED FROM PYTHON SOURCE LINES 57-62 .. code-block:: default test_ds.data['education-num'] = 13 test_ds.data['education'] = ' Bachelors' .. GENERATED FROM PYTHON SOURCE LINES 63-65 Build Model =========== .. GENERATED FROM PYTHON SOURCE LINES 65-73 .. code-block:: default from sklearn.compose import ColumnTransformer from sklearn.ensemble import RandomForestClassifier from sklearn.impute import SimpleImputer from sklearn.pipeline import Pipeline from sklearn.preprocessing import OrdinalEncoder .. GENERATED FROM PYTHON SOURCE LINES 74-92 .. code-block:: default numeric_transformer = SimpleImputer() categorical_transformer = Pipeline( steps=[("imputer", SimpleImputer(strategy="most_frequent")), ("encoder", OrdinalEncoder())] ) train_ds.features preprocessor = ColumnTransformer( transformers=[ ("num", numeric_transformer, train_ds.numerical_features), ("cat", categorical_transformer, train_ds.cat_features), ] ) model = Pipeline(steps=[("preprocessing", preprocessor), ("model", RandomForestClassifier(max_depth=5, n_jobs=-1))]) model = model.fit(train_ds.data[train_ds.features], train_ds.data[train_ds.label_name]) .. GENERATED FROM PYTHON SOURCE LINES 93-95 Run check ========= .. GENERATED FROM PYTHON SOURCE LINES 95-100 .. code-block:: default check = TrainTestPredictionDrift() result = check.run(train_dataset=train_ds, test_dataset=test_ds, model=model) result .. raw:: html
Train Test Prediction Drift


.. GENERATED FROM PYTHON SOURCE LINES 101-104 The prediction drift check can also calculate drift on the predicted classes rather than the probabilities. This is the default behavior for multiclass tasks. To force this behavior for binary tasks, set the ``drift_mode`` parameter to ``prediction``. .. GENERATED FROM PYTHON SOURCE LINES 104-108 .. code-block:: default check = TrainTestPredictionDrift(drift_mode='prediction') result = check.run(train_dataset=train_ds, test_dataset=test_ds, model=model) result .. raw:: html
Train Test Prediction Drift


.. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 5.518 seconds) .. _sphx_glr_download_checks_gallery_tabular_model_evaluation_plot_train_test_prediction_drift.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_train_test_prediction_drift.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_train_test_prediction_drift.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_