- PropertyLabelCorrelationChange.add_condition_property_pps_difference_less_than(threshold: float = 0.2, include_negative_diff: bool = False) PLC #
Add new condition.
Add condition that will check that difference between train dataset property pps and test dataset property pps is less than X. If per_class is True, the condition will apply per class, and a single class with pps difference greater than X will be enough to fail the condition.
- thresholdfloat , default: 0.2
train test ps difference upper bound.
- include_negative_diff: bool, default True
This parameter decides whether the condition checks the absolute value of the difference, or just the positive value. The difference is calculated as train PPS minus test PPS. This is because we’re interested in the case where the test dataset is less predictive of the label than the train dataset, as this could indicate leakage of labels into the train dataset.