MultivariateDrift#

class MultivariateDrift[source]#

Calculate drift between the entire train and test datasets using a model trained to distinguish between them.

Check fits a new model to distinguish between train and test datasets, called a Domain Classifier. Once the Domain Classifier is fitted the check calculates the feature importance for the domain classifier model. The result of the check is based on the AUC of the domain classifier model, and the check displays the change in distribution between train and test for the top features according to the calculated feature importance.

Parameters
n_top_columnsint , default: 3

Amount of columns to show ordered by domain classifier feature importance. This limit is used together (AND) with min_feature_importance, so less than n_top_columns features can be displayed.

min_feature_importancefloat , default: 0.05

Minimum feature importance to show in the check display. Feature importance sums to 1, so for example the default value of 0.05 means that all features with importance contributing less than 5% to the predictive power of the Domain Classifier won’t be displayed. This limit is used together (AND) with n_top_columns, so features more important than min_feature_importance can be hidden.

max_num_categories_for_display: int, default: 10

Max number of categories to show in plot.

show_categories_by: str, default: ‘largest_difference’

Specify which categories to show for categorical features’ graphs, as the number of shown categories is limited by max_num_categories_for_display. Possible values: - ‘train_largest’: Show the largest train categories. - ‘test_largest’: Show the largest test categories. - ‘largest_difference’: Show the largest difference between categories.

sample_sizeint , default: 10_000

Max number of rows to use from each dataset for the training and evaluation of the domain classifier.

random_stateint , default: 42

Random seed for the check.

test_sizefloat , default: 0.3

Fraction of the combined datasets to use for the evaluation of the domain classifier.

min_meaningful_drift_scorefloat , default 0.05

Minimum drift score for displaying drift in check. Under that score, check will display “nothing found”.

__init__(n_top_columns: int = 3, min_feature_importance: float = 0.05, max_num_categories_for_display: int = 10, show_categories_by: str = 'largest_difference', n_samples: int = 10000, random_state: int = 42, test_size: float = 0.3, min_meaningful_drift_score: float = 0.05, **kwargs)[source]#
__new__(*args, **kwargs)#

Methods

MultivariateDrift.add_condition(name, ...)

Add new condition function to the check.

MultivariateDrift.add_condition_overall_drift_value_less_than([...])

Add condition.

MultivariateDrift.clean_conditions()

Remove all conditions from this check instance.

MultivariateDrift.conditions_decision(result)

Run conditions on given result.

MultivariateDrift.config([include_version, ...])

Return check configuration (conditions' configuration not yet supported).

MultivariateDrift.from_config(conf[, ...])

Return check object from a CheckConfig object.

MultivariateDrift.from_json(conf[, ...])

Deserialize check instance from JSON string.

MultivariateDrift.metadata([with_doc_link])

Return check metadata.

MultivariateDrift.name()

Name of class in split camel case.

MultivariateDrift.params([show_defaults])

Return parameters to show when printing the check.

MultivariateDrift.remove_condition(index)

Remove given condition by index.

MultivariateDrift.run(train_dataset, ...[, ...])

Run check.

MultivariateDrift.run_logic(context)

Run check.

MultivariateDrift.to_json([indent, ...])

Serialize check instance to JSON string.

Examples#