Note

Go to the end to download the full example code

Create a Custom Suite#

A suite is a list of checks that will run one after the other, and its results will be displayed together.

To customize a suite, we can either:

Create new custom suites, by choosing the checks (and the optional conditions) that we want the suite to have.
Modify a built-in suite by adding and/or removing checks and conditions, to adapt it to our needs.

Create a New Suite#

Let’s say we want to create our custom suite, mainly with various performance checks, including PerformanceReport(), TrainTestDifferenceOverfit() and several more.

For assistance in understanding which checks are implemented and can be included, we suggest using any of:

API Reference
Tabular checks
Vision checks
NLP checks
Built-in suites (by printing them to see which checks they include)

from sklearn.metrics import make_scorer, precision_score, recall_score

from deepchecks.tabular import Suite
# importing all existing checks for demonstration simplicity
from deepchecks.tabular.checks import *

# The Suite's first argument is its name, and then all of the check objects.
# Some checks can receive arguments when initialized (all check arguments have default values)
# Each check can have an optional condition(/s)
# Multiple conditions can be applied subsequentially
new_custom_suite = Suite('Simple Suite For Model Performance',
                         ModelInfo(),
                         # use custom scorers for performance report:
                         TrainTestPerformance().add_condition_train_test_relative_degradation_less_than(threshold=0.15)\
                         .add_condition_test_performance_greater_than(0.8),
                         ConfusionMatrixReport(),
                         SimpleModelComparison(strategy='most_frequent',
                                               scorers={'Recall (Multiclass)': make_scorer(recall_score, average=None),
                                                        'Precision (Multiclass)': make_scorer(precision_score, average=None)}
                                               ).add_condition_gain_greater_than(0.3)
                         )

# The scorers' parameter can also be passed to the suite in order to override the scorers of all the checks
# in the suite. See :ref:`metrics_user_guide` for further details.

Let’s see the suite:

new_custom_suite

Simple Suite For Model Performance: [
    0: ModelInfo
    1: TrainTestPerformance
            Conditions:
                    0: Train-Test scores relative degradation is less than 0.15
                    1: Scores are greater than 0.8
    2: ConfusionMatrixReport
    3: SimpleModelComparison(alternative_scorers={'Recall (Multiclass)': make_scorer(recall_score, average=None), 'Precision (Multiclass)': make_scorer(precision_score, average=None)})
            Conditions:
                    0: Model performance gain over simple model is greater than 30%
]

TIP: the auto-complete may not work from inside a new suite definition, so if you want to use the auto-complete to see the arguments a check receive or the built-in conditions it has, try doing it outside of the suite’s initialization.

For example, to see a check’s built-in conditions, type in a new cell: NameOfDesiredCheck().add_condition_ and then check the auto-complete suggestions (using Shift + Tab), to discover the built-in checks.*

Additional Notes about Conditions in a Suite#

Checks in the built-in suites come with pre-defined conditions, and when building your custom suite you should choose which conditions to add.
Most check classes have built-in methods for adding conditions. These apply to the naming convention add_condition_..., which enables adding a condition logic to parse the check’s results.
Each check instance can have several conditions or none. Each condition will be evaluated separately.
The pass (✓) / fail (✖) / insight (!) status of the conditions, along with the condition’s name and extra info will be displayed in the suite’s Conditions Summary.
Most conditions have configurable arguments that can be passed to the condition while adding it.
For more info about conditions, check out Configure a Condition.

Run the Suite#

This is simply done by calling the run() method of the suite.

To see that in action, we’ll need datasets and a model.

Let’s quickly load a dataset and train a simple model for the sake of this demo

Load Datasets and Train a Simple Model#

import numpy as np
# General imports
import pandas as pd

np.random.seed(22)

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

from deepchecks.tabular.datasets.classification import iris

# Load pre-split Datasets
train_dataset, test_dataset = iris.load_data(as_train_test=True)
label_col = 'target'

# Train Model
rf_clf = RandomForestClassifier()
rf_clf.fit(train_dataset.data[train_dataset.features],
           train_dataset.data[train_dataset.label_name]);

RandomForestClassifier()

Run Suite#

new_custom_suite.run(model=rf_clf, train_dataset=train_dataset, test_dataset=test_dataset)

Simple Suite For Model Performance:
|     | 0/4 [Time: 00:00]
Simple Suite For Model Performance:
|██▌  | 2/4 [Time: 00:00, Check=Train Test Performance]/home/runner/work/deepchecks/deepchecks/venv/lib/python3.8/site-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning:

Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Simple Suite For Model Performance:
|█████| 4/4 [Time: 00:00, Check=Simple Model Comparison]

Simple Suite For Model Performance

Status	Check	Condition	More Info
✖	Train Test Performance	Train-Test scores relative degradation is less than 0.15	1 scores failed. Found max degradation of 16.67% for metric Recall and class 2.
✖	Simple Model Comparison	Model performance gain over simple model is greater than 30%	Found classes with failed metric's gain: {2: {'Recall (Multiclass)': '-5000%'}}
✓	Train Test Performance	Scores are greater than 0.8	Found minimum score for Recall metric of value 0.83 for class 2.

Conditions Summary

Status	Condition	More Info
✖	Train-Test scores relative degradation is less than 0.15	1 scores failed. Found max degradation of 16.67% for metric Recall and class 2.
✓	Scores are greater than 0.8	Found minimum score for Recall metric of value 0.83 for class 2.

Conditions Summary

Status	Condition	More Info
✖	Model performance gain over simple model is greater than 30%	Found classes with failed metric's gain: {2: {'Recall (Multiclass)': '-5000%'}}

Check	Summary
Model Info	Summarize given model parameters. Read More...
Confusion Matrix Report - Train Dataset	Calculate the confusion matrix of the model on the given dataset. Read More...
Confusion Matrix Report - Test Dataset	Calculate the confusion matrix of the model on the given dataset. Read More...

Parameter	Value	Default
bootstrap	True	True
ccp_alpha	0.00	0.00
class_weight	None	None
criterion	gini	gini
max_depth	None	None
max_features	auto	auto
max_leaf_nodes	None	None
max_samples	None	None
min_impurity_decrease	0.00	0.00
min_samples_leaf	1	1
min_samples_split	2	2
min_weight_fraction_leaf	0.00	0.00
n_estimators	100	100
n_jobs	None	None
oob_score	False	False
random_state	None	None
verbose	0	0
warm_start	False	False

Modify an Existing Suite#

from deepchecks.tabular.suites import train_test_validation

customized_suite = train_test_validation()

# let's check what it has:
customized_suite

Train Test Validation Suite: [
    0: DatasetsSizeComparison
            Conditions:
                    0: Test-Train size ratio is greater than 0.01
    1: NewLabelTrainTest
            Conditions:
                    0: Number of new label values is less or equal to 0
    2: NewCategoryTrainTest
            Conditions:
                    0: Ratio of samples with a new category is less or equal to 0%
    3: StringMismatchComparison
            Conditions:
                    0: No new variants allowed in test data
    4: DateTrainTestLeakageDuplicates
            Conditions:
                    0: Date leakage ratio is less or equal to 0%
    5: DateTrainTestLeakageOverlap
            Conditions:
                    0: Date leakage ratio is less or equal to 0%
    6: IndexTrainTestLeakage
            Conditions:
                    0: Ratio of leaking indices is less or equal to 0%
    7: TrainTestSamplesMix(n_to_show=5)
            Conditions:
                    0: Percentage of test data samples that appear in train data is less or equal to 5%
    8: FeatureLabelCorrelationChange(ppscore_params={}, random_state=42)
            Conditions:
                    0: Train-Test features' Predictive Power Score difference is less than 0.2
                    1: Train features' Predictive Power Score is less than 0.7
    9: FeatureDrift
            Conditions:
                    0: categorical drift score < 0.2 and numerical drift score < 0.2
    10: LabelDrift
            Conditions:
                    0: Label drift score < 0.15
    11: MultivariateDrift
            Conditions:
                    0: Drift value is less than 0.25
]

# and modify it by removing a check by index:
customized_suite.remove(1)

Train Test Validation Suite: [
    0: DatasetsSizeComparison
            Conditions:
                    0: Test-Train size ratio is greater than 0.01
    2: NewCategoryTrainTest
            Conditions:
                    0: Ratio of samples with a new category is less or equal to 0%
    3: StringMismatchComparison
            Conditions:
                    0: No new variants allowed in test data
    4: DateTrainTestLeakageDuplicates
            Conditions:
                    0: Date leakage ratio is less or equal to 0%
    5: DateTrainTestLeakageOverlap
            Conditions:
                    0: Date leakage ratio is less or equal to 0%
    6: IndexTrainTestLeakage
            Conditions:
                    0: Ratio of leaking indices is less or equal to 0%
    7: TrainTestSamplesMix(n_to_show=5)
            Conditions:
                    0: Percentage of test data samples that appear in train data is less or equal to 5%
    8: FeatureLabelCorrelationChange(ppscore_params={}, random_state=42)
            Conditions:
                    0: Train-Test features' Predictive Power Score difference is less than 0.2
                    1: Train features' Predictive Power Score is less than 0.7
    9: FeatureDrift
            Conditions:
                    0: categorical drift score < 0.2 and numerical drift score < 0.2
    10: LabelDrift
            Conditions:
                    0: Label drift score < 0.15
    11: MultivariateDrift
            Conditions:
                    0: Drift value is less than 0.25
]

from deepchecks.tabular.checks import UnusedFeatures

# and add a new check with a condition:
customized_suite.add(
    UnusedFeatures().add_condition_number_of_high_variance_unused_features_less_or_equal())

Train Test Validation Suite: [
    0: DatasetsSizeComparison
            Conditions:
                    0: Test-Train size ratio is greater than 0.01
    2: NewCategoryTrainTest
            Conditions:
                    0: Ratio of samples with a new category is less or equal to 0%
    3: StringMismatchComparison
            Conditions:
                    0: No new variants allowed in test data
    4: DateTrainTestLeakageDuplicates
            Conditions:
                    0: Date leakage ratio is less or equal to 0%
    5: DateTrainTestLeakageOverlap
            Conditions:
                    0: Date leakage ratio is less or equal to 0%
    6: IndexTrainTestLeakage
            Conditions:
                    0: Ratio of leaking indices is less or equal to 0%
    7: TrainTestSamplesMix(n_to_show=5)
            Conditions:
                    0: Percentage of test data samples that appear in train data is less or equal to 5%
    8: FeatureLabelCorrelationChange(ppscore_params={}, random_state=42)
            Conditions:
                    0: Train-Test features' Predictive Power Score difference is less than 0.2
                    1: Train features' Predictive Power Score is less than 0.7
    9: FeatureDrift
            Conditions:
                    0: categorical drift score < 0.2 and numerical drift score < 0.2
    10: LabelDrift
            Conditions:
                    0: Label drift score < 0.15
    11: MultivariateDrift
            Conditions:
                    0: Drift value is less than 0.25
    12: UnusedFeatures
            Conditions:
                    0: Number of high variance unused features is less or equal to 5
]

# lets remove all condition for the FeatureLabelCorrelationChange:
customized_suite[3].clean_conditions()

# and update the suite's name:
customized_suite.name = 'New Data Leakage Suite'

# and now we can run our modified suite:
customized_suite.run(train_dataset, test_dataset, rf_clf)

New Data Leakage Suite:
|            | 0/12 [Time: 00:00]
New Data Leakage Suite:
|████████    | 8/12 [Time: 00:00, Check=Feature Label Correlation Change]

New Data Leakage Suite

Status	Check	Condition	More Info
✖	Feature Label Correlation Change	Train features' Predictive Power Score is less than 0.7	Found 2 out of 4 features in train dataset with PPS above threshold: {'petal width (cm)': '0.93', 'petal length (cm)': '0.86'}
✓	Feature Label Correlation Change	Train-Test features' Predictive Power Score difference is less than 0.2	Passed for 4 relevant columns

Conditions Summary

Status	Condition	More Info
✖	Train features' Predictive Power Score is less than 0.7	Found 2 out of 4 features in train dataset with PPS above threshold: {'petal width (cm)': '0.93', 'petal length (cm)': '0.86'}
✓	Train-Test features' Predictive Power Score difference is less than 0.2	Passed for 4 relevant columns

Status	Check	Condition	More Info
✓	Datasets Size Comparison	Test-Train size ratio is greater than 0.01	Test-Train size ratio is 0.34
✓	New Category Train Test	Ratio of samples with a new category is less or equal to 0%	No relevant features to check were found
✓	Train Test Samples Mix	Percentage of test data samples that appear in train data is less or equal to 5%	Percent of test data samples that appear in train data: 2.63%
✓	Feature Drift	categorical drift score < 0.2 and numerical drift score < 0.2	Passed for 4 columns out of 4 columns. Found column "sepal width (cm)" has the highest numerical drift score: 0.14
✓	Label Drift	Label drift score < 0.15	Label's drift score Cramer's V is 0
✓	Unused Features - Train Dataset	Number of high variance unused features is less or equal to 5	Found 1 high variance unused features
✓	Unused Features - Test Dataset	Number of high variance unused features is less or equal to 5	Found 1 high variance unused features
✓	Multivariate Drift	Drift value is less than 0.25	Found drift value of: 0, corresponding to a domain classifier AUC of: 0.45

Conditions Summary

Status	Condition	More Info
✓	Test-Train size ratio is greater than 0.01	Test-Train size ratio is 0.34

	Train	Test
Size	112	38

Conditions Summary

Status	Condition	More Info
✓	Ratio of samples with a new category is less or equal to 0%	No relevant features to check were found

	# New Categories	Ratio of New Categories	Feature importance	New Categories Names
Feature Name

Conditions Summary

Status	Condition	More Info
✓	Percentage of test data samples that appear in train data is less or equal to 5%	Percent of test data samples that appear in train data: 2.63%

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)	target
Train indices: 30 Test indices: 28	5.80	2.70	5.10	1.90	2.00

Conditions Summary

Status	Condition	More Info
✓	categorical drift score < 0.2 and numerical drift score < 0.2	Passed for 4 columns out of 4 columns. Found column "sepal width (cm)" has the highest numerical drift score: 0.14

Conditions Summary

Status	Condition	More Info
✓	Label drift score < 0.15	Label's drift score Cramer's V is 0

Conditions Summary

Status	Condition	More Info
✓	Number of high variance unused features is less or equal to 5	Found 1 high variance unused features

Conditions Summary

Status	Condition	More Info
✓	Number of high variance unused features is less or equal to 5	Found 1 high variance unused features

Check	Summary
String Mismatch Comparison	Detect different variants of string categories between the same categorical column in two datasets. Read More...

Check	Reason
Date Train Test Leakage Duplicates	DatasetValidationError: Dataset does not contain a datetime. see Dataset docs
Date Train Test Leakage Overlap	DatasetValidationError: Dataset does not contain a datetime. see Dataset docs
Index Train Test Leakage	DatasetValidationError: Dataset does not contain an index. see Dataset docs

Total running time of the script: (0 minutes 3.714 seconds)

Gallery generated by Sphinx-Gallery

Configure Check Conditions

Viewing Deepchecks Results

Create a Custom Suite#

Create a New Suite#

Additional Notes about Conditions in a Suite#

Run the Suite#

Load Datasets and Train a Simple Model#

Run Suite#

Simple Suite For Model Performance

Train Test Performance

Conditions Summary

Additional Outputs

Simple Model Comparison

Conditions Summary

Additional Outputs

Model Info

Additional Outputs

Confusion Matrix Report - Train Dataset

Additional Outputs

Confusion Matrix Report - Test Dataset

Additional Outputs

Modify an Existing Suite#

New Data Leakage Suite

Feature Label Correlation Change

Conditions Summary

Additional Outputs

Datasets Size Comparison

Conditions Summary

Additional Outputs

New Category Train Test

Conditions Summary

Additional Outputs

Train Test Samples Mix

Conditions Summary

Additional Outputs

Feature Drift

Conditions Summary

Additional Outputs

Label Drift

Conditions Summary

Additional Outputs

Unused Features - Train Dataset

Conditions Summary

Additional Outputs

Unused Features - Test Dataset

Conditions Summary

Additional Outputs