.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "tabular/auto_checks/data_integrity/plot_feature_label_correlation.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_tabular_auto_checks_data_integrity_plot_feature_label_correlation.py: .. _tabular__feature_label_correlation: Feature Label Correlation *************************** This notebook provides an overview for using and understanding the Feature Label Correlation check. **Structure:** * `What is Feature Label Correlation <#what-is-feature-label-correlation>`__ * `Generate data <#generate-data>`__ * `Run the check <#run-the-check>`__ What is Feature Label Correlation ================================== The ``FeatureLabelCorrelation`` check computes the correlation between each feature and the label, potentially spotting features highly correlated with the label. This check works with 2 types of columns: categorical and numerical, and uses a different method to calculate the correlation for each feature label pair: 1. numerical-numerical: `Pearson's correlation coefficient `__ 2. numerical-categorical: `Correlation ratio `__ 3. categorical-categorical: `Symmetric Theil's U `__ .. GENERATED FROM PYTHON SOURCE LINES 32-34 Imports ======= .. GENERATED FROM PYTHON SOURCE LINES 34-41 .. code-block:: default import numpy as np import pandas as pd from deepchecks.tabular import Dataset from deepchecks.tabular.checks import FeatureLabelCorrelation .. GENERATED FROM PYTHON SOURCE LINES 42-44 Generate Data =============== .. GENERATED FROM PYTHON SOURCE LINES 44-50 .. code-block:: default df = pd.DataFrame(np.random.randn(100, 3), columns=['x1', 'x2', 'x3']) df['x4'] = df['x1'] * 0.5 + df['x2'] df['label'] = df['x2'] + 0.1 * df['x1'] df['x5'] = df['label'].apply(lambda x: 'v1' if x < 0 else 'v2') .. GENERATED FROM PYTHON SOURCE LINES 51-54 .. code-block:: default ds = Dataset(df, label='label', cat_features=[]) .. GENERATED FROM PYTHON SOURCE LINES 55-57 Run the check ================= .. GENERATED FROM PYTHON SOURCE LINES 57-60 .. code-block:: default my_check = FeatureLabelCorrelation(ppscore_params={'sample': 10}) my_check.run(dataset=ds) .. raw:: html
Feature Label Correlation


.. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.073 seconds) .. _sphx_glr_download_tabular_auto_checks_data_integrity_plot_feature_label_correlation.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_feature_label_correlation.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_feature_label_correlation.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_