TrainTestPredictionDrift#

class TrainTestPredictionDrift[source]#

Calculate prediction drift between train dataset and test dataset, using statistical measures.

Check calculates a drift score for the predictions in the test dataset, by comparing its distribution to the train dataset. As the predictions may be complex, we calculate different properties of the predictions and check their distribution.

A prediction property is any function that gets predictions and returns list of values. each value represents a property of the prediction, such as number of objects in image or tilt of each bounding box in image.

There are default properties per task: For classification: - distribution of classes

For object detection: - distribution of classes - distribution of bounding box areas - distribution of number of bounding boxes per image

For numerical distributions, we use the Earth Movers Distance. See https://en.wikipedia.org/wiki/Wasserstein_metric

For categorical distributions, we use the Cramer’s V. See https://en.wikipedia.org/wiki/Cram%C3%A9r%27s_V We also support Population Stability Index (PSI). See https://www.lexjansen.com/wuss/2017/47_Final_Paper_PDF.pdf.

For categorical prediction properties, it is recommended to use Cramer’s V, unless your variable includes categories with a small number of samples (common practice is categories with less than 5 samples). However, in cases of a variable with many categories with few samples, it is still recommended to use Cramer’s V.

Parameters
prediction_propertiesList[Dict[str, Any]], default: None

List of properties. Replaces the default deepchecks properties. Each property is dictionary with keys ‘name’ (str), ‘method’ (Callable) and ‘output_type’ (str), representing attributes of said method. ‘output_type’ must be one of: - ‘numeric’ - for continuous ordinal outputs. - ‘categorical’ - for discrete, non-ordinal outputs. These can still be numbers,

but these numbers do not have inherent value.

For more on image / label properties, see the property guide - ‘class_id’ - for properties that return the class_id. This is used because these

properties are later matched with the VisionData.label_map, if one was given.

margin_quantile_filter: float, default: 0.025

float in range [0,0.5), representing which margins (high and low quantiles) of the distribution will be filtered out of the EMD calculation. This is done in order for extreme values not to affect the calculation disproportionally. This filter is applied to both distributions, in both margins.

max_num_categories_for_drift: int, default: 10

Only for categorical columns. Max number of allowed categories. If there are more, they are binned into an “Other” category. If None, there is no limit.

max_num_categories_for_display: int, default: 10

Max number of categories to show in plot.

show_categories_by: str, default: ‘largest_difference’

Specify which categories to show for categorical features’ graphs, as the number of shown categories is limited by max_num_categories_for_display. Possible values: - ‘train_largest’: Show the largest train categories. - ‘test_largest’: Show the largest test categories. - ‘largest_difference’: Show the largest difference between categories.

categorical_drift_method: str, default: “cramer_v”

decides which method to use on categorical variables. Possible values are: “cramer_v” for Cramer’s V, “PSI” for Population Stability Index (PSI).

max_num_categories: int, default: None

Deprecated. Please use max_num_categories_for_drift and max_num_categories_for_display instead

__init__(prediction_properties: Optional[List[Dict[str, Any]]] = None, margin_quantile_filter: float = 0.025, max_num_categories_for_drift: int = 10, max_num_categories_for_display: int = 10, show_categories_by: str = 'largest_difference', categorical_drift_method: str = 'cramer_v', max_num_categories: Optional[int] = None, **kwargs)[source]#
__new__(*args, **kwargs)#

Methods

TrainTestPredictionDrift.add_condition(name, ...)

Add new condition function to the check.

TrainTestPredictionDrift.add_condition_drift_score_less_than([...])

Add condition - require prediction properties drift score to be less than the threshold.

TrainTestPredictionDrift.clean_conditions()

Remove all conditions from this check instance.

TrainTestPredictionDrift.compute(context)

Calculate drift on prediction properties samples that were collected during update() calls.

TrainTestPredictionDrift.conditions_decision(result)

Run conditions on given result.

TrainTestPredictionDrift.config()

Return check configuration (conditions' configuration not yet supported).

TrainTestPredictionDrift.from_config(conf)

Return check object from a CheckConfig object.

TrainTestPredictionDrift.initialize_run(context)

Initialize run.

TrainTestPredictionDrift.metadata([...])

Return check metadata.

TrainTestPredictionDrift.name()

Name of class in split camel case.

TrainTestPredictionDrift.params([show_defaults])

Return parameters to show when printing the check.

TrainTestPredictionDrift.reduce_output(...)

Return prediction drift score per prediction property.

TrainTestPredictionDrift.remove_condition(index)

Remove given condition by index.

TrainTestPredictionDrift.run(train_dataset, ...)

Run check.

TrainTestPredictionDrift.update(context, ...)

Perform update on batch for train or test properties.

Examples#