PropertyLabelCorrelationChange#
- class PropertyLabelCorrelationChange[source]#
Return the Predictive Power Score of image properties, in order to estimate their ability to predict the label.
The PPS represents the ability of a feature to single-handedly predict another feature or label. In this check, we specifically use it to assess the ability to predict the label by an image property (e.g. brightness, contrast etc.) A high PPS (close to 1) can mean that there’s a bias in the dataset, as a single property can predict the label successfully, using simple classic ML algorithms - meaning that a deep learning algorithm may accidentally learn these properties instead of more accurate complex abstractions. For example, in a classification dataset of wolves and dogs photographs, if only wolves are photographed in the snow, the brightness of the image may be used to predict the label “wolf” easily. In this case, a model might not learn to discern wolf from dog by the animal’s characteristics, but by using the background color.
When we compare train PPS to test PPS, A high difference can strongly indicate bias in the datasets, as a property that was “powerful” in train but not in test can be explained by bias in train that does not affect a new dataset.
For classification tasks, this check uses PPS to predict the class by image properties. For object detection tasks, this check uses PPS to predict the class of each bounding box, by the image properties of that specific bounding box.
Uses the ppscore package - for more info, see https://github.com/8080labs/ppscore
- Parameters
- image_propertiesList[Dict[str, Any]], default: None
List of properties. Replaces the default deepchecks properties. Each property is a dictionary with keys
'name'
(str),method
(Callable) and'output_type'
(str), representing attributes of said method. ‘output_type’ must be one of:'numeric'
- for continuous ordinal outputs.'categorical'
- for discrete, non-ordinal outputs. These can still be numbers, but these numbers do not have inherent value.
For more on image / label properties, see the guide about Data Properties.
- per_classbool, default: True
boolean that indicates whether the results of this check should be calculated for all classes or per class in label. If True, the conditions will be run per class as well.
- n_top_properties: int, default: 5
Number of features to show, sorted by the magnitude of difference in PPS
- random_state: int, default: None
Random state for the ppscore.predictors function
- min_pps_to_show: float, default 0.05
Minimum PPS to show a class in the graph
- ppscore_params: dict, default: None
dictionary of additional parameters for the ppscore predictor function
- __init__(image_properties: Optional[List[Dict[str, Any]]] = None, n_top_properties: int = 3, per_class: bool = True, random_state: Optional[int] = None, min_pps_to_show: float = 0.05, ppscore_params: Optional[dict] = None, **kwargs)[source]#
- __new__(*args, **kwargs)#
Methods
Add new condition function to the check. |
|
|
Add new condition. |
|
Add new condition. |
Remove all conditions from this check instance. |
|
Calculate the PPS between each property and the label. |
|
Run conditions on given result. |
|
Return check configuration (conditions' configuration not yet supported). |
|
Return check object from a CheckConfig object. |
|
Deserialize check instance from JSON string. |
|
Initialize run before starting updating on batches. |
|
Return check metadata. |
|
Name of class in split camel case. |
|
Return parameters to show when printing the check. |
|
Remove given condition by index. |
|
|
Run check. |
|
Serialize check instance to JSON string. |
Calculate image properties for train or test batches. |