ImagePropertyDrift#
- class ImagePropertyDrift[source]#
Calculate drift between train dataset and test dataset per image property, using statistical measures.
Check calculates a drift score for each image property in test dataset, by comparing its distribution to the train dataset. For this, we use the Earth Movers Distance.
See https://en.wikipedia.org/wiki/Wasserstein_metric
- Parameters
- image_propertiesList[Dict[str, Any]], default: None
List of properties. Replaces the default deepchecks properties. Each property is dictionary with keys ‘name’ (str), ‘method’ (Callable) and ‘output_type’ (str), representing attributes of said method. ‘output_type’ must be one of: - ‘numeric’ - for continuous ordinal outputs. - ‘categorical’ - for discrete, non-ordinal outputs. These can still be numbers,
but these numbers do not have inherent value.
For more on image / label properties, see the property guide
- margin_quantile_filter: float, default: 0.025
float in range [0,0.5), representing which margins (high and low quantiles) of the distribution will be filtered out of the EMD calculation. This is done in order for extreme values not to affect the calculation disproportionally. This filter is applied to both distributions, in both margins.
- max_num_categories_for_drift: int, default: 10
Only for discrete properties. Max number of allowed categories. If there are more, they are binned into an “Other” category. If None, there is no limit.
- max_num_categories_for_display: int, default: 10
Max number of categories to show in plot.
- show_categories_by: str, default: ‘largest_difference’
Specify which categories to show for categorical features’ graphs, as the number of shown categories is limited by max_num_categories_for_display. Possible values: - ‘train_largest’: Show the largest train categories. - ‘test_largest’: Show the largest test categories. - ‘largest_difference’: Show the largest difference between categories.
- classes_to_displayOptional[List[float]], default: None
List of classes to display. The distribution of the properties would include only samples belonging (or containing an annotation belonging) to one of these classes. If None, samples from all classes are displayed.
- min_samples: int, default: 30
Minimum number of samples needed in each dataset needed to calculate the drift.
- max_num_categories: int, default: None
Deprecated. Please use max_num_categories_for_drift and max_num_categories_for_display instead
- __init__(image_properties: Optional[List[Dict[str, Any]]] = None, margin_quantile_filter: float = 0.025, max_num_categories_for_drift: int = 10, max_num_categories_for_display: int = 10, show_categories_by: str = 'largest_difference', classes_to_display: Optional[List[str]] = None, min_samples: int = 30, max_num_categories: Optional[int] = None, **kwargs)[source]#
- __new__(*args, **kwargs)#
Methods
|
Add new condition function to the check. |
|
Add condition - require drift score to be less than a certain threshold. |
Remove all conditions from this check instance. |
|
|
Calculate drift score between train and test datasets for the collected image properties. |
Run conditions on given result. |
|
Return check configuration (conditions' configuration not yet supported). |
|
Return check object from a CheckConfig object. |
|
|
Initialize self state, and validate the run context. |
|
Return check metadata. |
Name of class in split camel case. |
|
|
Return parameters to show when printing the check. |
Remove given condition by index. |
|
|
Run check. |
|
Calculate image properties for train or test batch. |