FeatureLabelCorrelationChange.add_condition_feature_pps_difference_less_than(threshold: float = 0.2, include_negative_diff: bool = True) FLC[source]#

Add condition - difference between train dataset feature pps and test dataset feature pps is less than the threshold.

threshold: float, default: 0.2

train test pps difference upper bound.

include_negative_diff: bool, default True

This parameter decides whether the condition checks the absolute value of the difference, or just the positive value. The difference is calculated as train PPS minus test PPS. This is because we’re interested in the case where the test dataset is less predictive of the label than the train dataset, as this could indicate leakage of labels into the train dataset.