.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "checks_gallery/tabular/data_integrity/plot_feature_feature_correlation.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_checks_gallery_tabular_data_integrity_plot_feature_feature_correlation.py: .. _plot_tabular_feature_feature_correlation: Feature Feature Correlation *************************** This notebook provides an overview for using and understanding the feature-feature correlation check. This check computes the pairwise correlations between the features, potentially spotting pairs of features that are highly correlated. **Structure:** * `How are The Correlations Calculated? <#how-are-the-correlations-calculated>`__ * `Load Data <#load-data>`__ * `Run the Check <#run-the-check>`__ * `Define a Condition <#define-a-condition>`__ How are The Correlations Calculated? ==================================== This check works with 2 types of features: categorical and numerical, and uses a different method to calculate the correlation for each combination of feature types: 1. numerical-numerical: `Pearson's correlation coefficient `__ 2. numerical-categorical: `Correlation ratio `__ 3. categorical-categorical: `Symmetric Theil's U `__ .. GENERATED FROM PYTHON SOURCE LINES 33-35 Imports ======= .. GENERATED FROM PYTHON SOURCE LINES 35-40 .. code-block:: default import pandas as pd from deepchecks.tabular.datasets.classification import adult from deepchecks.tabular.checks.data_integrity import FeatureFeatureCorrelation .. GENERATED FROM PYTHON SOURCE LINES 41-44 Load Data =============== We load the Adult dataset, a dataset based on the 1994 US Census containing both numerical and categorical features. .. GENERATED FROM PYTHON SOURCE LINES 44-47 .. code-block:: default ds = adult.load_data(as_train_test=False) .. GENERATED FROM PYTHON SOURCE LINES 48-50 Run the Check =============================================== .. GENERATED FROM PYTHON SOURCE LINES 50-58 .. code-block:: default check = FeatureFeatureCorrelation() check.run(ds) # To display the results in an IDE like PyCharm, you can use the following code: # check.run(ds).show() # The result will be displayed in a new window. .. raw:: html
Feature-Feature Correlation


.. GENERATED FROM PYTHON SOURCE LINES 59-64 Define a Condition ================== Now we will define a condition on the maximum number of pairs that are correlated above a certain threshold. In this example, we will define a condition that the maximum number of pairs that are correlated above 0.8 is less than 3. .. GENERATED FROM PYTHON SOURCE LINES 64-70 .. code-block:: default check = FeatureFeatureCorrelation() check.add_condition_max_number_of_pairs_above_threshold(0.8, 3) result = check.run(ds) result.show(show_additional_outputs=False) .. raw:: html
Feature-Feature Correlation


.. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 3.047 seconds) .. _sphx_glr_download_checks_gallery_tabular_data_integrity_plot_feature_feature_correlation.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_feature_feature_correlation.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_feature_feature_correlation.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_