ImageDatasetDrift#

class ImageDatasetDrift[source]#

Calculate drift between the entire train and test datasets (based on image properties) using a trained model.

Check fits a new model to distinguish between train and test datasets, called a Domain Classifier. The Domain Classifier is a tabular model, that cannot run on the images themselves. Therefore, the check calculates properties for each image (such as brightness, aspect ratio etc.) and uses them as input features to the Domain Classifier. Once the Domain Classifier is fitted the check calculates the feature importance for the domain classifier model. The result of the check is based on the AUC of the domain classifier model, and the check displays the change in distribution between train and test for the top features according to the calculated feature importance.

Parameters
image_propertiesList[Dict[str, Any]], default: None

List of properties. Replaces the default deepchecks properties. Each property is dictionary with keys ‘name’ (str), ‘method’ (Callable) and ‘output_type’ (str), representing attributes of said method. ‘output_type’ must be one of: - ‘numeric’ - for continuous ordinal outputs. - ‘categorical’ - for discrete, non-ordinal outputs. These can still be numbers,

but these numbers do not have inherent value.

For more on image / label properties, see the property guide

n_top_propertiesint , default: 3

Amount of properties to show ordered by domain classifier feature importance. This limit is used together (AND) with min_feature_importance, so less than n_top_columns features can be displayed.

min_feature_importancefloat , default: 0.05

Minimum feature importance to show in the check display. The features are the image properties that are given to the Domain Classifier as features to learn on. Feature importance sums to 1, so for example the default value of 0.05 means that all features with importance contributing less than 5% to the predictive power of the Domain Classifier won’t be displayed. This limit is used together (AND) with n_top_columns, so features more important than min_feature_importance can be hidden.

max_num_categories_for_display: int, default: 10

Max number of categories to show in plot.

show_categories_by: str, default: ‘largest_difference’

Specify which categories to show for categorical features’ graphs, as the number of shown categories is limited by max_num_categories_for_display. Possible values: - ‘train_largest’: Show the largest train categories. - ‘test_largest’: Show the largest test categories. - ‘largest_difference’: Show the largest difference between categories.

sample_sizeint , default: 10_000

Max number of rows to use from each dataset for the training and evaluation of the domain classifier.

test_sizefloat , default: 0.3

Fraction of the combined datasets to use for the evaluation of the domain classifier.

min_meaningful_drift_scorefloat , default 0.05

Minimum drift score for displaying drift in check. Under that score, check will display “nothing found”.

__init__(image_properties: Optional[List[Dict[str, Any]]] = None, n_top_properties: int = 3, min_feature_importance: float = 0.05, sample_size: int = 10000, test_size: float = 0.3, min_meaningful_drift_score: float = 0.05, max_num_categories_for_display: int = 10, show_categories_by: str = 'largest_difference', **kwargs)[source]#
__new__(*args, **kwargs)#

Methods

ImageDatasetDrift.add_condition(name, ...)

Add new condition function to the check.

ImageDatasetDrift.add_condition_drift_score_less_than([...])

Add condition - require drift score to be less than the threshold.

ImageDatasetDrift.clean_conditions()

Remove all conditions from this check instance.

ImageDatasetDrift.compute(context)

Train a Domain Classifier on image property data that was collected during update() calls.

ImageDatasetDrift.conditions_decision(result)

Run conditions on given result.

ImageDatasetDrift.config()

Return check configuration (conditions' configuration not yet supported).

ImageDatasetDrift.from_config(conf)

Return check object from a CheckConfig object.

ImageDatasetDrift.initialize_run(context)

Initialize self state, and validate the run context.

ImageDatasetDrift.metadata([with_doc_link])

Return check metadata.

ImageDatasetDrift.name()

Name of class in split camel case.

ImageDatasetDrift.params([show_defaults])

Return parameters to show when printing the check.

ImageDatasetDrift.remove_condition(index)

Remove given condition by index.

ImageDatasetDrift.run(train_dataset, ...[, ...])

Run check.

ImageDatasetDrift.update(context, batch, ...)

Calculate image properties for train or test batches.

Examples#