.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "tabular/auto_checks/train_test_validation/plot_label_drift.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_tabular_auto_checks_train_test_validation_plot_label_drift.py: .. _tabular__label_drift: Label Drift ********************** This notebooks provides an overview for using and understanding label drift check. **Structure:** * `What Is Label Drift? <#what-is-label-drift>`__ * `Run Check on a Classification Label <#run-check-on-a-classification-label>`__ * `Run Check on a Regression Label <#run-check-on-a-regression-label>`__ * `Add a Condition <#run-check>`__ What Is Label Drift? ======================== Drift is simply a change in the distribution of data over time, and it is also one of the top reasons why machine learning model's performance degrades over time. Label drift is when drift occurs in the label itself. For more information on drift, please visit our :ref:`Drift Guide `. How Deepchecks Detects Label Drift ------------------------------------ This check detects label drift by using :ref:`univariate measures ` on the label column. .. GENERATED FROM PYTHON SOURCE LINES 36-45 .. code-block:: default import pprint import numpy as np import pandas as pd from deepchecks.tabular import Dataset from deepchecks.tabular.checks import LabelDrift .. GENERATED FROM PYTHON SOURCE LINES 46-48 Run Check on a Classification Label ==================================== .. GENERATED FROM PYTHON SOURCE LINES 48-64 .. code-block:: default # Generate data: # -------------- np.random.seed(42) train_data = np.concatenate([np.random.randn(1000,2), np.random.choice(a=[1,0], p=[0.5, 0.5], size=(1000, 1))], axis=1) #Create test_data with drift in label: test_data = np.concatenate([np.random.randn(1000,2), np.random.choice(a=[1,0], p=[0.35, 0.65], size=(1000, 1))], axis=1) df_train = pd.DataFrame(train_data, columns=['col1', 'col2', 'target']) df_test = pd.DataFrame(test_data, columns=['col1', 'col2', 'target']) train_dataset = Dataset(df_train, label='target') test_dataset = Dataset(df_test, label='target') .. GENERATED FROM PYTHON SOURCE LINES 65-68 .. code-block:: default df_train.head() .. raw:: html
col1 col2 target
0 0.496714 -0.138264 1.0
1 0.647689 1.523030 1.0
2 -0.234153 -0.234137 1.0
3 1.579213 0.767435 1.0
4 -0.469474 0.542560 0.0


.. GENERATED FROM PYTHON SOURCE LINES 69-71 Run Check =============================== .. GENERATED FROM PYTHON SOURCE LINES 71-76 .. code-block:: default check = LabelDrift() result = check.run(train_dataset=train_dataset, test_dataset=test_dataset) result .. raw:: html
Label Drift


.. GENERATED FROM PYTHON SOURCE LINES 77-79 Run Check on a Regression Label ================================ .. GENERATED FROM PYTHON SOURCE LINES 79-94 .. code-block:: default # Generate data: # -------------- train_data = np.concatenate([np.random.randn(1000,2), np.random.randn(1000, 1)], axis=1) test_data = np.concatenate([np.random.randn(1000,2), np.random.randn(1000, 1)], axis=1) df_train = pd.DataFrame(train_data, columns=['col1', 'col2', 'target']) df_test = pd.DataFrame(test_data, columns=['col1', 'col2', 'target']) #Create drift in test: df_test['target'] = df_test['target'].astype('float') + abs(np.random.randn(1000)) + np.arange(0, 1, 0.001) * 4 train_dataset = Dataset(df_train, label='target') test_dataset = Dataset(df_test, label='target') .. GENERATED FROM PYTHON SOURCE LINES 95-97 Run check --------- .. GENERATED FROM PYTHON SOURCE LINES 97-102 .. code-block:: default check = LabelDrift() result = check.run(train_dataset=train_dataset, test_dataset=test_dataset) result .. raw:: html
Label Drift


.. GENERATED FROM PYTHON SOURCE LINES 103-105 Add a Condition =============== .. GENERATED FROM PYTHON SOURCE LINES 105-108 .. code-block:: default check_cond = LabelDrift().add_condition_drift_score_less_than() check_cond.run(train_dataset=train_dataset, test_dataset=test_dataset) .. raw:: html
Label Drift


.. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.288 seconds) .. _sphx_glr_download_tabular_auto_checks_train_test_validation_plot_label_drift.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_label_drift.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_label_drift.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_