.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "checks_gallery/tabular/distribution/plot_train_test_label_drift.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_checks_gallery_tabular_distribution_plot_train_test_label_drift.py: Train Test Label Drift ********************** .. GENERATED FROM PYTHON SOURCE LINES 8-17 .. code-block:: default import pprint import numpy as np import pandas as pd from deepchecks.tabular import Dataset from deepchecks.tabular.checks import TrainTestLabelDrift .. GENERATED FROM PYTHON SOURCE LINES 18-20 Generate data - Classification label ==================================== .. GENERATED FROM PYTHON SOURCE LINES 20-33 .. code-block:: default np.random.seed(42) train_data = np.concatenate([np.random.randn(1000,2), np.random.choice(a=[1,0], p=[0.5, 0.5], size=(1000, 1))], axis=1) #Create test_data with drift in label: test_data = np.concatenate([np.random.randn(1000,2), np.random.choice(a=[1,0], p=[0.35, 0.65], size=(1000, 1))], axis=1) df_train = pd.DataFrame(train_data, columns=['col1', 'col2', 'target']) df_test = pd.DataFrame(test_data, columns=['col1', 'col2', 'target']) train_dataset = Dataset(df_train, label='target') test_dataset = Dataset(df_test, label='target') .. GENERATED FROM PYTHON SOURCE LINES 34-37 .. code-block:: default df_train.head() .. raw:: html

	col1	col2	target
0	0.496714	-0.138264	1.0
1	0.647689	1.523030	1.0
2	-0.234153	-0.234137	1.0
3	1.579213	0.767435	1.0
4	-0.469474	0.542560	0.0

.. GENERATED FROM PYTHON SOURCE LINES 38-40 Run Check ========= .. GENERATED FROM PYTHON SOURCE LINES 40-45 .. code-block:: default check = TrainTestLabelDrift() result = check.run(train_dataset=train_dataset, test_dataset=test_dataset) result .. raw:: html

Train Test Label Drift

Calculate label drift between train dataset and test dataset, using statistical measures.

Additional Outputs

The Drift score is a measure for the difference between two distributions, in this check - the test and train distributions.
The check shows the drift score and distributions for the label.

.. GENERATED FROM PYTHON SOURCE LINES 46-48 Generate data - Regression label ================================ .. GENERATED FROM PYTHON SOURCE LINES 48-60 .. code-block:: default train_data = np.concatenate([np.random.randn(1000,2), np.random.randn(1000, 1)], axis=1) test_data = np.concatenate([np.random.randn(1000,2), np.random.randn(1000, 1)], axis=1) df_train = pd.DataFrame(train_data, columns=['col1', 'col2', 'target']) df_test = pd.DataFrame(test_data, columns=['col1', 'col2', 'target']) #Create drift in test: df_test['target'] = df_test['target'].astype('float') + abs(np.random.randn(1000)) + np.arange(0, 1, 0.001) * 4 train_dataset = Dataset(df_train, label='target') test_dataset = Dataset(df_test, label='target') .. GENERATED FROM PYTHON SOURCE LINES 61-63 Run check ========= .. GENERATED FROM PYTHON SOURCE LINES 63-68 .. code-block:: default check = TrainTestLabelDrift() result = check.run(train_dataset=train_dataset, test_dataset=test_dataset) result .. raw:: html

Train Test Label Drift

Calculate label drift between train dataset and test dataset, using statistical measures.

Additional Outputs

The Drift score is a measure for the difference between two distributions, in this check - the test and train distributions.
The check shows the drift score and distributions for the label.

.. GENERATED FROM PYTHON SOURCE LINES 69-70 Add condition .. GENERATED FROM PYTHON SOURCE LINES 70-73 .. code-block:: default check_cond = TrainTestLabelDrift().add_condition_drift_score_not_greater_than() check_cond.run(train_dataset=train_dataset, test_dataset=test_dataset) .. raw:: html

Train Test Label Drift

Calculate label drift between train dataset and test dataset, using statistical measures.

Conditions Summary

Status	Condition	More Info
✖	PSI <= 0.2 and Earth Mover's Distance <= 0.1 for label drift	Label's Earth Mover's Distance above threshold: 0.34

Additional Outputs

The Drift score is a measure for the difference between two distributions, in this check - the test and train distributions.
The check shows the drift score and distributions for the label.

.. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.134 seconds) .. _sphx_glr_download_checks_gallery_tabular_distribution_plot_train_test_label_drift.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_train_test_label_drift.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_train_test_label_drift.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_