FeatureFeatureCorrelation#

class FeatureFeatureCorrelation[source]#

Checks for pairwise correlation between the features.

Extremely correlated pairs of features could indicate redundancy and even duplication. Removing highly correlated features from the data can significantly increase model speed due to the curse of dimensionality, and decrease harmful bias.

Parameters
columnsUnion[Hashable, List[Hashable]] , default: None

Columns to check, if none are given checks all columns except ignored ones.

ignore_columnsUnion[Hashable, List[Hashable]] , default: None

Columns to ignore, if none given checks based on columns variable.

show_n_top_columnsint , optional

amount of columns to show ordered by the highest correlation, default: 10

n_samplesint , default: 10_000

number of samples to use for this check.

random_stateint, default: 42

random seed for all check internals.

__init__(columns: Optional[Union[Hashable, List[Hashable]]] = None, ignore_columns: Optional[Union[Hashable, List[Hashable]]] = None, show_n_top_columns: int = 10, n_samples: int = 10000, random_state: int = 42, **kwargs)[source]#
__new__(*args, **kwargs)#

Methods

FeatureFeatureCorrelation.add_condition(...)

Add new condition function to the check.

FeatureFeatureCorrelation.add_condition_max_number_of_pairs_above_threshold([...])

Add condition that all pairwise correlations are less than threshold, except for the diagonal.

FeatureFeatureCorrelation.clean_conditions()

Remove all conditions from this check instance.

FeatureFeatureCorrelation.conditions_decision(result)

Run conditions on given result.

FeatureFeatureCorrelation.config([...])

Return check configuration (conditions' configuration not yet supported).

FeatureFeatureCorrelation.from_config(conf)

Return check object from a CheckConfig object.

FeatureFeatureCorrelation.from_json(conf[, ...])

Deserialize check instance from JSON string.

FeatureFeatureCorrelation.metadata([...])

Return check metadata.

FeatureFeatureCorrelation.name()

Name of class in split camel case.

FeatureFeatureCorrelation.params([show_defaults])

Return parameters to show when printing the check.

FeatureFeatureCorrelation.remove_condition(index)

Remove given condition by index.

FeatureFeatureCorrelation.run(dataset[, ...])

Run check.

FeatureFeatureCorrelation.run_logic(context, ...)

Run Check.

FeatureFeatureCorrelation.to_json([indent])

Serialize check instance to JSON string.

Examples#