Quickstart in 5 minutes#

In order to run your first Deepchecks Suite all you need to have is the data and model that you wish to validate. More specifically, you need:

  • Your train and test data (in Pandas DataFrames or Numpy Arrays)

  • (optional) A supported model (including XGBoost, scikit-learn models, and many more). Required for running checks that need the model’s predictions for running.

To run your first suite on your data and model, you need only a few lines of code, that start here: Define a Dataset Object.

# If you don’t have deepchecks installed yet:

# If you don't have deepchecks installed yet:
import sys
!{sys.executable} -m pip install deepchecks -U --quiet #--user

Load Data, Split Train-Val, and Train a Simple Model#

For the purpose of this guide we’ll use the simple iris dataset and train a simple random forest model for multiclass classification:

import numpy as np
# General imports
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

from deepchecks.tabular.datasets.classification import iris

# Load Data
iris_df = iris.load_data(data_format='Dataframe', as_train_test=False)
label_col = 'target'
df_train, df_test = train_test_split(iris_df, stratify=iris_df[label_col], random_state=0)

# Train Model
rf_clf = RandomForestClassifier(random_state=0)
rf_clf.fit(df_train.drop(label_col, axis=1), df_train[label_col]);

Out:

RandomForestClassifier(random_state=0)

Define a Dataset Object#

Initialize the Dataset object, stating the relevant metadata about the dataset (e.g. the name for the label column)

Check out the Dataset’s attributes to see which additional special columns can be declared and used (e.g. date column, index column).

from deepchecks.tabular import Dataset

# We explicitly state that this dataset has no categorical features, otherwise they will be automatically inferred
# If the dataset has categorical features, the best practice is to pass a list with their names

ds_train = Dataset(df_train, label=label_col, cat_features=[])
ds_test =  Dataset(df_test,  label=label_col, cat_features=[])

Run a Deepchecks Suite#

Run the full suite#

Use the full_suite that is a collection of (most of) the prebuilt checks.

Check out the when should you use deepchecks guide for some more info about the existing suites and when to use them.

from deepchecks.tabular.suites import full_suite

suite = full_suite()
suite.run(train_dataset=ds_train, test_dataset=ds_test, model=rf_clf)

Out:

Full Suite:   0%|                                    | 0/36 [00:00<?, ? Check/s]
Full Suite:   0%|                                    | 0/36 [00:00<?, ? Check/s, Check=Model Info]
Full Suite:   3%|#                                   | 1/36 [00:00<00:00, 51.44 Check/s, Check=Columns Info]
Full Suite:   6%|##                                  | 2/36 [00:00<00:00, 35.86 Check/s, Check=Confusion Matrix Report]
Full Suite:   8%|###                                 | 3/36 [00:00<00:00, 34.66 Check/s, Check=Performance Report]
Full Suite:  11%|####                                | 4/36 [00:00<00:02, 14.42 Check/s, Check=Performance Report]
Full Suite:  11%|####                                | 4/36 [00:00<00:02, 14.42 Check/s, Check=Roc Report]
Full Suite:  14%|#####                               | 5/36 [00:00<00:02, 14.42 Check/s, Check=Simple Model Comparison]
Full Suite:  17%|######                              | 6/36 [00:00<00:01, 15.77 Check/s, Check=Simple Model Comparison]
Full Suite:  17%|######                              | 6/36 [00:00<00:01, 15.77 Check/s, Check=Model Error Analysis]
Full Suite:  19%|#######                             | 7/36 [00:00<00:01, 15.77 Check/s, Check=Calibration Score]
Full Suite:  22%|########                            | 8/36 [00:00<00:02, 12.89 Check/s, Check=Calibration Score]
Full Suite:  22%|########                            | 8/36 [00:00<00:02, 12.89 Check/s, Check=Regression Systematic Error]
Full Suite:  25%|#########                           | 9/36 [00:00<00:02, 12.89 Check/s, Check=Regression Error Distribution]
Full Suite:  28%|##########                          | 10/36 [00:00<00:02, 12.89 Check/s, Check=Boosting Overfit]
Full Suite:  31%|###########                         | 11/36 [00:00<00:01, 12.89 Check/s, Check=Unused Features]
Full Suite:  33%|############                        | 12/36 [00:00<00:01, 12.89 Check/s, Check=Model Inference Time]
Full Suite:  36%|#############                       | 13/36 [00:00<00:01, 12.89 Check/s, Check=Train Test Feature Drift]
Full Suite:  39%|##############                      | 14/36 [00:00<00:01, 20.80 Check/s, Check=Train Test Feature Drift]
Full Suite:  39%|##############                      | 14/36 [00:00<00:01, 20.80 Check/s, Check=Train Test Label Drift]
Full Suite:  42%|###############                     | 15/36 [00:00<00:01, 20.80 Check/s, Check=Whole Dataset Drift]   Calculating permutation feature importance. Expected to finish in 1 seconds

Full Suite:  44%|################                    | 16/36 [00:00<00:00, 20.80 Check/s, Check=Dominant Frequency Change]
Full Suite:  47%|#################                   | 17/36 [00:00<00:00, 22.63 Check/s, Check=Dominant Frequency Change]
Full Suite:  47%|#################                   | 17/36 [00:00<00:00, 22.63 Check/s, Check=Category Mismatch Train Test]
Full Suite:  50%|##################                  | 18/36 [00:00<00:00, 22.63 Check/s, Check=New Label Train Test]
Full Suite:  53%|###################                 | 19/36 [00:00<00:00, 22.63 Check/s, Check=String Mismatch Comparison]
Full Suite:  56%|####################                | 20/36 [00:00<00:00, 22.63 Check/s, Check=Datasets Size Comparison]
Full Suite:  58%|#####################               | 21/36 [00:00<00:00, 22.63 Check/s, Check=Date Train Test Leakage Duplicates]
Full Suite:  61%|######################              | 22/36 [00:00<00:00, 22.63 Check/s, Check=Date Train Test Leakage Overlap]
Full Suite:  64%|#######################             | 23/36 [00:00<00:00, 22.63 Check/s, Check=Single Feature Contribution Train Test]
Full Suite:  67%|########################            | 24/36 [00:00<00:00, 22.63 Check/s, Check=Train Test Samples Mix]
Full Suite:  69%|#########################           | 25/36 [00:00<00:00, 22.63 Check/s, Check=Identifier Leakage]
Full Suite:  72%|##########################          | 26/36 [00:00<00:00, 22.63 Check/s, Check=Index Train Test Leakage]
Full Suite:  75%|###########################         | 27/36 [00:00<00:00, 22.63 Check/s, Check=Is Single Value]
Full Suite:  78%|############################        | 28/36 [00:00<00:00, 22.63 Check/s, Check=Mixed Nulls]
Full Suite:  81%|#############################       | 29/36 [00:00<00:00, 46.21 Check/s, Check=Mixed Nulls]
Full Suite:  81%|#############################       | 29/36 [00:00<00:00, 46.21 Check/s, Check=Mixed Data Types]
Full Suite:  83%|##############################      | 30/36 [00:00<00:00, 46.21 Check/s, Check=String Mismatch]
Full Suite:  86%|###############################     | 31/36 [00:00<00:00, 46.21 Check/s, Check=Data Duplicates]
Full Suite:  89%|################################    | 32/36 [00:00<00:00, 46.21 Check/s, Check=String Length Out Of Bounds]
Full Suite:  92%|#################################   | 33/36 [00:00<00:00, 46.21 Check/s, Check=Special Characters]
Full Suite:  94%|##################################  | 34/36 [00:00<00:00, 46.21 Check/s, Check=Conflicting Labels]
Full Suite:  97%|################################### | 35/36 [00:01<00:00, 46.21 Check/s, Check=Outlier Sample Detection]

Full Suite

The suite is composed of various checks such as: Model Info, Date Train Test Leakage Overlap, Columns Info, etc...
Each check may contain conditions (which will result in pass / fail / warning / error , represented by / / ! / ) as well as other outputs such as plots or tables.
Suites, checks and conditions can all be modified. Read more about custom suites.


Conditions Summary

Status Check Condition More Info
Performance Report Train-Test scores relative degradation is not greater than 0.1 Precision for class 1 (train=1 test=0.87) Recall for class 2 (train=1 test=0.83)
Single Feature Contribution Train-Test Train features' Predictive Power Score is not greater than 0.7 Features in train dataset with PPS above threshold: {'petal width (cm)': '0.91', 'petal length (cm)': '0.83'}
ROC Report - Test Dataset AUC score for all the classes is not less than 0.7
Special Characters - Test Dataset Ratio of entirely special character samples not greater than 0.1%
Special Characters - Train Dataset Ratio of entirely special character samples not greater than 0.1%
String Length Out Of Bounds - Test Dataset Ratio of outliers not greater than 0% string length outliers
String Length Out Of Bounds - Train Dataset Ratio of outliers not greater than 0% string length outliers
Data Duplicates - Test Dataset Duplicate data ratio is not greater than 0%
Data Duplicates - Train Dataset Duplicate data ratio is not greater than 0%
String Mismatch - Test Dataset No string variants
String Mismatch - Train Dataset No string variants
Mixed Data Types - Test Dataset Rare data types in column are either more than 10% or less than 1% of the data
Mixed Data Types - Train Dataset Rare data types in column are either more than 10% or less than 1% of the data
Mixed Nulls - Test Dataset Not more than 1 different null types
Mixed Nulls - Train Dataset Not more than 1 different null types
Single Value in Column - Test Dataset Does not contain only a single value
Single Value in Column - Train Dataset Does not contain only a single value
Train Test Samples Mix Percentage of test data samples that appear in train data not greater than 10%
Conflicting Labels - Train Dataset Ambiguous sample ratio is not greater than 0%
Single Feature Contribution Train-Test Train-Test features' Predictive Power Score difference is not greater than 0.2
Datasets Size Comparison Test-Train size ratio is not smaller than 0.01
String Mismatch Comparison No new variants allowed in test data
New Label Train Test Number of new label values is not greater than 0
Category Mismatch Train Test Ratio of samples with a new category is not greater than 0%
Dominant Frequency Change Change in ratio of dominant value in data is not greater than 25%
Whole Dataset Drift Drift value is not greater than 0.25
Train Test Label Drift PSI <= 0.2 and Earth Mover's Distance <= 0.1 for label drift
Train Test Drift PSI <= 0.2 and Earth Mover's Distance <= 0.1
Model Inference Time - Test Dataset Average model inference time for one sample is not greater than 0.001
Model Inference Time - Train Dataset Average model inference time for one sample is not greater than 0.001
Unused Features Number of high variance unused features is not greater than 5
Simple Model Comparison Model performance gain over simple model is not less than 10%
ROC Report - Train Dataset AUC score for all the classes is not less than 0.7
Conflicting Labels - Test Dataset Ambiguous sample ratio is not greater than 0%

Check With Conditions Output

Performance Report

Summarize given scores on a dataset and model.

Conditions Summary
Status Condition More Info
Train-Test scores relative degradation is not greater than 0.1 Precision for class 1 (train=1 test=0.87) Recall for class 2 (train=1 test=0.83)
Additional Outputs

Go to top

ROC Report - Train Dataset

Calculate the ROC curve for each class.

Conditions Summary
Status Condition More Info
AUC score for all the classes is not less than 0.7
Additional Outputs
The marked points are the optimal threshold cut-off points. They are determined using Youden's index defined as sensitivity + specificity - 1

Go to top

ROC Report - Test Dataset

Calculate the ROC curve for each class.

Conditions Summary
Status Condition More Info
AUC score for all the classes is not less than 0.7
Additional Outputs
The marked points are the optimal threshold cut-off points. They are determined using Youden's index defined as sensitivity + specificity - 1

Go to top

Simple Model Comparison

Compare given model score to simple model score (according to given model type).

Conditions Summary
Status Condition More Info
Model performance gain over simple model is not less than 10%
Additional Outputs

Go to top

Unused Features

Detect features that are nearly unused by the model.

Conditions Summary
Status Condition More Info
Number of high variance unused features is not greater than 5
Additional Outputs
Features above the line are a sample of the most important features, while the features below the line are the unused features with highest variance, as defined by check parameters

Go to top

Model Inference Time - Train Dataset

Measure model average inference time (in seconds) per sample.

Conditions Summary
Status Condition More Info
Average model inference time for one sample is not greater than 0.001
Additional Outputs
Average model inference time for one sample (in seconds): 8.2e-05

Go to top

Model Inference Time - Test Dataset

Measure model average inference time (in seconds) per sample.

Conditions Summary
Status Condition More Info
Average model inference time for one sample is not greater than 0.001
Additional Outputs
Average model inference time for one sample (in seconds): 0.00021871

Go to top

Train Test Drift

Calculate drift between train dataset and test dataset per feature, using statistical measures.

Conditions Summary
Status Condition More Info
PSI <= 0.2 and Earth Mover's Distance <= 0.1
Additional Outputs
The Drift score is a measure for the difference between two distributions, in this check - the test and train distributions.
The check shows the drift score and distributions for the features, sorted by feature importance and showing only the top 5 features, according to feature importance.
If available, the plot titles also show the feature importance (FI) rank.

Go to top

Train Test Label Drift

Calculate label drift between train dataset and test dataset, using statistical measures.

Conditions Summary
Status Condition More Info
PSI <= 0.2 and Earth Mover's Distance <= 0.1 for label drift
Additional Outputs
The Drift score is a measure for the difference between two distributions, in this check - the test and train distributions.
The check shows the drift score and distributions for the label.

Go to top

Datasets Size Comparison

Verify test dataset size comparing it to the train dataset size.

Conditions Summary
Status Condition More Info
Test-Train size ratio is not smaller than 0.01
Additional Outputs
  Train Test
Size 112 38

Go to top

Single Feature Contribution Train-Test

Return the Predictive Power Score of all features, in order to estimate each feature's ability to predict the label.

Conditions Summary
Status Condition More Info
Train features' Predictive Power Score is not greater than 0.7 Features in train dataset with PPS above threshold: {'petal width (cm)': '0.91', 'petal length (cm)': '0.83'}
Train-Test features' Predictive Power Score difference is not greater than 0.2
Additional Outputs
The Predictive Power Score (PPS) is used to estimate the ability of a feature to predict the label by itself. (Read more about Predictive Power Score)
In the graph above, we should suspect we have problems in our data if:
1. Train dataset PPS values are high:
Can indicate that this feature's success in predicting the label is actually due to data leakage,
meaning that the feature holds information that is based on the label to begin with.
2. Large difference between train and test PPS (train PPS is larger):
An even more powerful indication of data leakage, as a feature that was powerful in train but not in test
can be explained by leakage in train that is not relevant to a new dataset.
3. Large difference between test and train PPS (test PPS is larger):
An anomalous value, could indicate drift in test dataset that caused a coincidental correlation to the target label.

Go to top

Train Test Samples Mix

Detect samples in the test data that appear also in training data.

Conditions Summary
Status Condition More Info
Percentage of test data samples that appear in train data not greater than 10%
Additional Outputs
2.63% (1 / 38) of test data samples appear in train data
  sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target
Train indices: 101 Test indices: 142 5.80 2.70 5.10 1.90 2

Go to top

Check Without Conditions Output

Model Info

Summarize given model parameters.

Additional Outputs
Model Type: RandomForestClassifier
Parameter Value Default
bootstrap True True
ccp_alpha 0.00 0.00
class_weight None None
criterion gini gini
max_depth None None
max_features auto auto
max_leaf_nodes None None
max_samples None None
min_impurity_decrease 0.00 0.00
min_samples_leaf 1 1
min_samples_split 2 2
min_weight_fraction_leaf 0.00 0.00
n_estimators 100 100
n_jobs None None
oob_score False False
random_state 0 None
verbose 0 0
warm_start False False

Colored rows are parameters with non-default values


Go to top

Columns Info - Train Dataset

Return the role and logical type of each column.

Additional Outputs
* showing only the top 10 columns, you can change it using n_top_columns param
  target petal width (cm) petal length (cm) sepal length (cm) sepal width (cm)
role label numerical feature numerical feature numerical feature numerical feature

Go to top

Columns Info - Test Dataset

Return the role and logical type of each column.

Additional Outputs
* showing only the top 10 columns, you can change it using n_top_columns param
  target petal width (cm) petal length (cm) sepal length (cm) sepal width (cm)
role label numerical feature numerical feature numerical feature numerical feature

Go to top

Confusion Matrix Report - Train Dataset

Calculate the confusion matrix of the model on the given dataset.

Additional Outputs

Go to top

Confusion Matrix Report - Test Dataset

Calculate the confusion matrix of the model on the given dataset.

Additional Outputs

Go to top

Calibration Metric - Train Dataset

Calculate the calibration curve with brier score for each class.

Additional Outputs
Calibration curves (also known as reliability diagrams) compare how well the probabilistic predictions of a binary classifier are calibrated. It plots the true frequency of the positive label against its predicted probability, for binned predictions.
The Brier score metric may be used to assess how well a classifier is calibrated. For more info, please visit https://en.wikipedia.org/wiki/Brier_score

Go to top

Calibration Metric - Test Dataset

Calculate the calibration curve with brier score for each class.

Additional Outputs
Calibration curves (also known as reliability diagrams) compare how well the probabilistic predictions of a binary classifier are calibrated. It plots the true frequency of the positive label against its predicted probability, for binned predictions.
The Brier score metric may be used to assess how well a classifier is calibrated. For more info, please visit https://en.wikipedia.org/wiki/Brier_score

Go to top

Outlier Sample Detection - Train Dataset

Detects outliers in a dataset using the LoOP algorithm.

Additional Outputs
The Outlier Probability Score is calculated by the LoOP algorithm which measures the local deviation of density of a given sample with respect to its neighbors. These outlier scores are directly interpretable as a probability of an object being an outlier (see link for more information).

  Outlier Probability Score sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target
41 0.89 4.50 2.30 1.30 0.30 0
106 0.72 4.90 2.50 4.50 1.70 2
56 0.57 6.30 3.30 4.70 1.60 1
114 0.56 5.80 2.80 5.10 2.40 2
22 0.56 4.60 3.60 1.00 0.20 0

Go to top

Other Checks That Weren't Displayed

Check Reason
Model Error Analysis Unable to train meaningful error model (r^2 score: 0.14)
Index Train Test Leakage There is no index defined to use. Did you pass a DataFrame instead of a Dataset?
Identifier Leakage - Test Dataset Check is irrelevant for Datasets without index or date column
Identifier Leakage - Train Dataset Check is irrelevant for Datasets without index or date column
Date Train Test Leakage Overlap There is no datetime defined to use. Did you pass a DataFrame instead of a Dataset?
Date Train Test Leakage Duplicates There is no datetime defined to use. Did you pass a DataFrame instead of a Dataset?
Outlier Sample Detection - Test Dataset NotEnoughSamplesError: There are not enough samples to run this check, found only 38 samples.
Boosting Overfit Check is relevant for Boosting models of type ('AdaBoostClassifier', 'GradientBoostingClassifier', 'LGBMClassifier', 'XGBClassifier', 'CatBoostClassifier', 'AdaBoostRegressor', 'GradientBoostingRegressor', 'LGBMRegressor', 'XGBRegressor', 'CatBoostRegressor'), but received model of type RandomForestClassifier
Regression Error Distribution - Test Dataset Check is relevant for models of type ['regression'], but received model of type 'multiclass'
Regression Error Distribution - Train Dataset Check is relevant for models of type ['regression'], but received model of type 'multiclass'
Regression Systematic Error - Test Dataset Check is relevant for models of type ['regression'], but received model of type 'multiclass'
Regression Systematic Error - Train Dataset Check is relevant for models of type ['regression'], but received model of type 'multiclass'
Dominant Frequency Change Nothing found
Conflicting Labels - Train Dataset Nothing found
Special Characters - Test Dataset Nothing found
Special Characters - Train Dataset Nothing found
String Length Out Of Bounds - Test Dataset Nothing found
String Length Out Of Bounds - Train Dataset Nothing found
Data Duplicates - Test Dataset Nothing found
Data Duplicates - Train Dataset Nothing found
String Mismatch - Test Dataset Nothing found
String Mismatch - Train Dataset Nothing found
Mixed Nulls - Test Dataset Nothing found
Mixed Data Types - Train Dataset Nothing found
Whole Dataset Drift Nothing found
Mixed Nulls - Train Dataset Nothing found
Single Value in Column - Test Dataset Nothing found
Conflicting Labels - Test Dataset Nothing found
String Mismatch Comparison Nothing found
New Label Train Test Nothing found
Category Mismatch Train Test Nothing found
Mixed Data Types - Test Dataset Nothing found
Single Value in Column - Train Dataset Nothing found

Go to top


Run the integrity suite#

If you still haven’t started modeling and just have a single dataset, you can use the single_dataset_integrity:

from deepchecks.tabular.suites import single_dataset_integrity

integ_suite = single_dataset_integrity()
integ_suite.run(ds_train)

Out:

Single Dataset Integrity Suite:   0%|         | 0/9 [00:00<?, ? Check/s]
Single Dataset Integrity Suite:   0%|         | 0/9 [00:00<?, ? Check/s, Check=Is Single Value]
Single Dataset Integrity Suite:  11%|#        | 1/9 [00:00<00:00, 923.45 Check/s, Check=Mixed Nulls]
Single Dataset Integrity Suite:  22%|##       | 2/9 [00:00<00:00, 676.83 Check/s, Check=Mixed Data Types]
Single Dataset Integrity Suite:  33%|###      | 3/9 [00:00<00:00, 835.24 Check/s, Check=String Mismatch]
Single Dataset Integrity Suite:  44%|####     | 4/9 [00:00<00:00, 763.36 Check/s, Check=Data Duplicates]
Single Dataset Integrity Suite:  56%|#####    | 5/9 [00:00<00:00, 656.94 Check/s, Check=String Length Out Of Bounds]
Single Dataset Integrity Suite:  67%|######   | 6/9 [00:00<00:00, 557.12 Check/s, Check=Special Characters]
Single Dataset Integrity Suite:  78%|#######  | 7/9 [00:00<00:00, 565.49 Check/s, Check=Conflicting Labels]
Single Dataset Integrity Suite:  89%|######## | 8/9 [00:00<00:00, 420.71 Check/s, Check=Outlier Sample Detection]

Single Dataset Integrity Suite

The suite is composed of various checks such as: String Length Out Of Bounds, Outlier Sample Detection, Mixed Nulls, etc...
Each check may contain conditions (which will result in pass / fail / warning / error , represented by / / ! / ) as well as other outputs such as plots or tables.
Suites, checks and conditions can all be modified. Read more about custom suites.


Conditions Summary

Status Check Condition More Info
Single Value in Column Does not contain only a single value
Mixed Nulls Not more than 1 different null types
Mixed Data Types Rare data types in column are either more than 10% or less than 1% of the data
String Mismatch No string variants
Data Duplicates Duplicate data ratio is not greater than 0%
String Length Out Of Bounds Ratio of outliers not greater than 0% string length outliers
Special Characters Ratio of entirely special character samples not greater than 0.1%
Conflicting Labels Ambiguous sample ratio is not greater than 0%

Check With Conditions Output


Check Without Conditions Output

Outlier Sample Detection

Detects outliers in a dataset using the LoOP algorithm.

Additional Outputs
The Outlier Probability Score is calculated by the LoOP algorithm which measures the local deviation of density of a given sample with respect to its neighbors. These outlier scores are directly interpretable as a probability of an object being an outlier (see link for more information).

  Outlier Probability Score sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target
41 0.89 4.50 2.30 1.30 0.30 0
106 0.72 4.90 2.50 4.50 1.70 2
56 0.57 6.30 3.30 4.70 1.60 1
114 0.56 5.80 2.80 5.10 2.40 2
22 0.56 4.60 3.60 1.00 0.20 0

Go to top

Other Checks That Weren't Displayed

Check Reason
Single Value in Column Nothing found
Mixed Nulls Nothing found
Mixed Data Types Nothing found
String Mismatch Nothing found
Data Duplicates Nothing found
String Length Out Of Bounds Nothing found
Special Characters Nothing found
Conflicting Labels Nothing found

Go to top


Run a Deepchecks Check#

If you want to run a specific check, you can just import it and run it directly.

Check out the Check tabular examples in the examples or the API Reference for more info about the existing checks and their parameters.

from deepchecks.tabular.checks import TrainTestLabelDrift
check = TrainTestLabelDrift()
result = check.run(ds_train, ds_test)
result

Train Test Label Drift

Calculate label drift between train dataset and test dataset, using statistical measures.

Additional Outputs
The Drift score is a measure for the difference between two distributions, in this check - the test and train distributions.
The check shows the drift score and distributions for the label.


and also inspect the result value which has a check-dependant structure:

result.value

Out:

{'Drift score': 0.002507306267756066, 'Method': 'PSI'}

Edit an Existing Suite#

Inspect suite and remove condition#

We can see that the single feature contribution failed, both for test and for train. Since this is a very simple dataset with few features and this behavior is not necessarily problematic, we will remove the existing conditions for the PPS

# Lets first print the suite to find the conditions that we want to change:

suite

Out:

Full Suite: [
    0: ModelInfo
    1: ColumnsInfo
    2: ConfusionMatrixReport
    3: PerformanceReport
            Conditions:
                    0: Train-Test scores relative degradation is not greater than 0.1
    4: RocReport(excluded_classes=[])
            Conditions:
                    0: AUC score for all the classes is not less than 0.7
    5: SimpleModelComparison
            Conditions:
                    0: Model performance gain over simple model is not less than 10%
    6: ModelErrorAnalysis
            Conditions:
                    0: The performance difference of the detected segments must not be greater than 5%
    7: CalibrationScore
    8: RegressionSystematicError
            Conditions:
                    0: Bias ratio is not greater than 0.01
    9: RegressionErrorDistribution
            Conditions:
                    0: Kurtosis value is not less than -0.1
    10: BoostingOverfit
            Conditions:
                    0: Test score over iterations doesn't decline by more than 5% from the best score
    11: UnusedFeatures
            Conditions:
                    0: Number of high variance unused features is not greater than 5
    12: ModelInferenceTime
            Conditions:
                    0: Average model inference time for one sample is not greater than 0.001
    13: TrainTestFeatureDrift
            Conditions:
                    0: PSI <= 0.2 and Earth Mover's Distance <= 0.1
    14: TrainTestLabelDrift
            Conditions:
                    0: PSI <= 0.2 and Earth Mover's Distance <= 0.1 for label drift
    15: WholeDatasetDrift
            Conditions:
                    0: Drift value is not greater than 0.25
    16: DominantFrequencyChange
            Conditions:
                    0: Change in ratio of dominant value in data is not greater than 25%
    17: CategoryMismatchTrainTest
            Conditions:
                    0: Ratio of samples with a new category is not greater than 0%
    18: NewLabelTrainTest
            Conditions:
                    0: Number of new label values is not greater than 0
    19: StringMismatchComparison
            Conditions:
                    0: No new variants allowed in test data
    20: DatasetsSizeComparison
            Conditions:
                    0: Test-Train size ratio is not smaller than 0.01
    21: DateTrainTestLeakageDuplicates
            Conditions:
                    0: Date leakage ratio is not greater than 0%
    22: DateTrainTestLeakageOverlap
            Conditions:
                    0: Date leakage ratio is not greater than 0%
    23: SingleFeatureContributionTrainTest(ppscore_params={})
            Conditions:
                    0: Train-Test features' Predictive Power Score difference is not greater than 0.2
                    1: Train features' Predictive Power Score is not greater than 0.7
    24: TrainTestSamplesMix
            Conditions:
                    0: Percentage of test data samples that appear in train data not greater than 10%
    25: IdentifierLeakage(ppscore_params={})
            Conditions:
                    0: Identifier columns PPS is not greater than 0
    26: IndexTrainTestLeakage
            Conditions:
                    0: Ratio of leaking indices is not greater than 0%
    27: IsSingleValue
            Conditions:
                    0: Does not contain only a single value
    28: MixedNulls
            Conditions:
                    0: Not more than 1 different null types
    29: MixedDataTypes
            Conditions:
                    0: Rare data types in column are either more than 10% or less than 1% of the data
    30: StringMismatch
            Conditions:
                    0: No string variants
    31: DataDuplicates
            Conditions:
                    0: Duplicate data ratio is not greater than 0%
    32: StringLengthOutOfBounds
            Conditions:
                    0: Ratio of outliers not greater than 0% string length outliers
    33: SpecialCharacters
            Conditions:
                    0: Ratio of entirely special character samples not greater than 0.1%
    34: ConflictingLabels
            Conditions:
                    0: Ambiguous sample ratio is not greater than 0%
    35: OutlierSampleDetection
]
# now we can use the check's index and the condition's number to remove it:
print(suite[6])
suite[6].remove_condition(0)

Out:

ModelErrorAnalysis
        Conditions:
                0: The performance difference of the detected segments must not be greater than 5%
# print and see that the condition was removed
suite[6]

Out:

ModelErrorAnalysis

If we now re-run the suite, all of the existing conditions will pass.

Note: the check we manipulated will still run as part of the Suite, however it won’t appear in the Conditions Summary since it no longer has any conditions defined on it. You can still see its display results in the Additional Outputs section

For more info about working with conditions, see the detailed configuring conditions guide.

Total running time of the script: ( 0 minutes 2.264 seconds)

Gallery generated by Sphinx-Gallery