MixedDataTypes.add_condition_rare_type_ratio_not_in_range#
- MixedDataTypes.add_condition_rare_type_ratio_not_in_range(ratio_range: Tuple[float, float] = (0.01, 0.1))[source]#
Add condition - Whether the ratio of rarer data type (strings or numbers) is not in the “danger zone”.
The “danger zone” represents the following logic - if the rarer data type is, for example, 30% of the data, than the column is presumably supposed to contain both numbers and string values. If the rarer data type is, for example, less than 1% of the data, than it’s presumably a contamination, but a negligible one. In the range between, there is a real chance that the rarer data type may represent a problem to model training and inference.
- Parameters
- ratio_rangeTuple[float, float] , default: (0.01 , 0.1)
The range between which the ratio of rarer data type in the column is considered a problem.