ImageDatasetDrift#
- class ImageDatasetDrift[source]#
Calculate drift between the entire train and test datasets (based on image properties) using a trained model.
Check fits a new model to distinguish between train and test datasets, called a Domain Classifier. The Domain Classifier is a tabular model, that cannot run on the images themselves. Therefore, the check calculates properties for each image (such as brightness, aspect ratio etc.) and uses them as input features to the Domain Classifier. Once the Domain Classifier is fitted the check calculates the feature importance for the domain classifier model. The result of the check is based on the AUC of the domain classifier model, and the check displays the change in distribution between train and test for the top features according to the calculated feature importance.
- Parameters
- image_propertiesList[Dict[str, Any]], default: None
List of properties. Replaces the default deepchecks properties. Each property is dictionary with keys ‘name’ (str), ‘method’ (Callable) and ‘output_type’ (str), representing attributes of said method. ‘output_type’ must be one of ‘continuous’/’discrete’
- n_top_propertiesint , default: 3
Amount of properties to show ordered by domain classifier feature importance. This limit is used together (AND) with min_feature_importance, so less than n_top_columns features can be displayed.
- min_feature_importancefloat , default: 0.05
Minimum feature importance to show in the check display. The features are the image properties that are given to the Domain Classifier as features to learn on. Feature importance sums to 1, so for example the default value of 0.05 means that all features with importance contributing less than 5% to the predictive power of the Domain Classifier won’t be displayed. This limit is used together (AND) with n_top_columns, so features more important than min_feature_importance can be hidden.
- sample_sizeint , default: 10_000
Max number of rows to use from each dataset for the training and evaluation of the domain classifier.
- test_sizefloat , default: 0.3
Fraction of the combined datasets to use for the evaluation of the domain classifier.
- min_meaningful_drift_scorefloat , default 0.05
Minimum drift score for displaying drift in check. Under that score, check will display “nothing found”.
- __init__(image_properties: Optional[List[Dict[str, Any]]] = None, n_top_properties: int = 3, min_feature_importance: float = 0.05, sample_size: int = 10000, test_size: float = 0.3, min_meaningful_drift_score: float = 0.05, **kwargs)[source]#
- __new__(*args, **kwargs)#
Methods
|
Add new condition function to the check. |
Remove all conditions from this check instance. |
|
|
Train a Domain Classifier on image property data that was collected during update() calls. |
Run conditions on given result. |
|
Finalize the check result by adding the check instance and processing the conditions. |
|
|
Initialize run before starting updating on batches. |
|
Return check metadata. |
Name of class in split camel case. |
|
|
Return parameters to show when printing the check. |
Remove given condition by index. |
|
|
Run check. |
|
Calculate image properties for train or test batches. |