Note
Go to the end to download the full example code
Feature Feature Correlation#
This notebook provides an overview for using and understanding the feature-feature correlation check.
This check computes the pairwise correlations between the features, potentially spotting pairs of features that are highly correlated.
Structure:
How are The Correlations Calculated?#
This check works with 2 types of features: categorical and numerical, and uses a different method to calculate the correlation for each combination of feature types:
numerical-numerical: Pearson’s correlation coefficient
numerical-categorical: Correlation ratio
categorical-categorical: Symmetric Theil’s U
Imports#
import pandas as pd
from deepchecks.tabular.datasets.classification import adult
from deepchecks.tabular.checks.data_integrity import FeatureFeatureCorrelation
Load Data#
We load the Adult dataset, a dataset based on the 1994 US Census containing both numerical and categorical features.
ds = adult.load_data(as_train_test=False)
Run the Check#
check = FeatureFeatureCorrelation()
check.run(ds)
# To display the results in an IDE like PyCharm, you can use the following code:
# check.run(ds).show()
# The result will be displayed in a new window.
Define a Condition#
Now we will define a condition on the maximum number of pairs that are correlated above a certain threshold. In this example, we will define a condition that the maximum number of pairs that are correlated above 0.8 is less than 3.
check = FeatureFeatureCorrelation()
check.add_condition_max_number_of_pairs_above_threshold(0.8, 3)
result = check.run(ds)
result.show(show_additional_outputs=False)
Total running time of the script: ( 0 minutes 2.675 seconds)