.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "checks_gallery/tabular/methodology/plot_identifier_leakage.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_checks_gallery_tabular_methodology_plot_identifier_leakage.py: Identifier Leakage ****************** .. GENERATED FROM PYTHON SOURCE LINES 8-10 Imports ======= .. GENERATED FROM PYTHON SOURCE LINES 10-18 .. code-block:: default import matplotlib.pyplot as plt import numpy as np import pandas as pd from deepchecks.tabular import Dataset from deepchecks.tabular.checks.methodology import * .. GENERATED FROM PYTHON SOURCE LINES 19-20 Generating Data .. GENERATED FROM PYTHON SOURCE LINES 20-27 .. code-block:: default np.random.seed(42) df = pd.DataFrame(np.random.randn(100, 3), columns=['x1', 'x2', 'x3']) df['x4'] = df['x1'] * 0.05 + df['x2'] df['x5'] = df['x2']*121 + 0.01 * df['x1'] df['label'] = df['x5'].apply(lambda x: 0 if x < 0 else 1) .. GENERATED FROM PYTHON SOURCE LINES 28-31 .. code-block:: default dataset = Dataset(df, label='label', index_name='x1', datetime_name='x2') .. GENERATED FROM PYTHON SOURCE LINES 32-34 Running ``identifier_leakage`` check ==================================== .. GENERATED FROM PYTHON SOURCE LINES 34-37 .. code-block:: default IdentifierLeakage().run(dataset) .. raw:: html

Identifier Leakage

Check if identifiers (Index/Date) can be used to predict the label.

Additional Outputs
The PPS represents the ability of a feature to single-handedly predict another feature or label.
For Identifier columns (Index/Date) PPS should be nearly 0, otherwise date and index have some predictive effect on the label.


.. GENERATED FROM PYTHON SOURCE LINES 38-40 Using the ``SingleFeatureContribution`` check class =================================================== .. GENERATED FROM PYTHON SOURCE LINES 40-43 .. code-block:: default my_check = IdentifierLeakage(ppscore_params={'sample': 10}) my_check.run(dataset=dataset) .. raw:: html

Identifier Leakage

Check if identifiers (Index/Date) can be used to predict the label.

Additional Outputs
The PPS represents the ability of a feature to single-handedly predict another feature or label.
For Identifier columns (Index/Date) PPS should be nearly 0, otherwise date and index have some predictive effect on the label.


.. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.126 seconds) .. _sphx_glr_download_checks_gallery_tabular_methodology_plot_identifier_leakage.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_identifier_leakage.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_identifier_leakage.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_