FeatureFeatureCorrelation#
- class FeatureFeatureCorrelation[source]#
Checks for pairwise correlation between the features.
Extremely correlated pairs of features could indicate redundancy and even duplication. Removing highly correlated features from the data can significantly increase model speed due to the curse of dimensionality, and decrease harmful bias.
- Parameters
- columnsUnion[Hashable, List[Hashable]] , default: None
Columns to check, if none are given checks all columns except ignored ones.
- ignore_columnsUnion[Hashable, List[Hashable]] , default: None
Columns to ignore, if none given checks based on columns variable.
- show_n_top_columnsint , optional
amount of columns to show ordered by the highest correlation, default: 10
- n_samplesint , default: 10_000
number of samples to use for this check.
- random_stateint, default: 42
random seed for all check internals.
- __init__(columns: Optional[Union[Hashable, List[Hashable]]] = None, ignore_columns: Optional[Union[Hashable, List[Hashable]]] = None, show_n_top_columns: int = 10, n_samples: int = 10000, random_state: int = 42, **kwargs)[source]#
- __new__(*args, **kwargs)#
Methods
Add new condition function to the check. |
|
|
Add condition that all pairwise correlations are less than threshold, except for the diagonal. |
Remove all conditions from this check instance. |
|
Run conditions on given result. |
|
Return check configuration (conditions' configuration not yet supported). |
|
Return check object from a CheckConfig object. |
|
|
Deserialize check instance from JSON string. |
Return check metadata. |
|
Name of class in split camel case. |
|
|
Return parameters to show when printing the check. |
Remove given condition by index. |
|
|
Run check. |
|
Run Check. |
|
Serialize check instance to JSON string. |