# Create a Custom Suite#

A suite is a list of checks that will run one after the other, and its results will be displayed together.

To customize a suite, we can either:

## Create a New Suite#

Let’s say we want to create our custom suite, mainly with various performance checks, including PerformanceReport(), TrainTestDifferenceOverfit() and several more.

For assistance in understanding which checks are implemented and can be included, we suggest using any of:

from sklearn.metrics import make_scorer, precision_score, recall_score

from deepchecks.tabular import Suite
# importing all existing checks for demonstration simplicity
from deepchecks.tabular.checks import *

# The Suite's first argument is its name, and then all of the check objects.
# Some checks can receive arguments when initialized (all check arguments have default values)
# Each check can have an optional condition(/s)
# Multiple conditions can be applied subsequentially
new_custom_suite = Suite('Simple Suite For Model Performance',
ModelInfo(),
# use custom scorers for performance report:
ConfusionMatrixReport(),
SimpleModelComparison(strategy='most_frequent', \
alternative_scorers={'Recall (Multiclass)': make_scorer(recall_score, average=None), \
'Precision (Multiclass)': make_scorer(precision_score, average=None)} \
)
# Let's see the suite:
new_custom_suite

Simple Suite For Model Performance: [
0: ModelInfo
1: TrainTestPerformance
Conditions:
0: Train-Test scores relative degradation is less than 0.15
1: Scores are greater than 0.8
2: ConfusionMatrixReport
3: SimpleModelComparison(alternative_scorers={'Recall (Multiclass)': make_scorer(recall_score, average=None), 'Precision (Multiclass)': make_scorer(precision_score, average=None)})
Conditions:
0: Model performance gain over simple model is greater than 30%
]


TIP: the auto-complete may not work from inside a new suite definition, so if you want to use the auto-complete to see the arguments a check receive or the built-in conditions it has, try doing it outside of the suite’s initialization.

For example, to see a check’s built-in conditions, type in a new cell: NameOfDesiredCheck().add_condition_ and then check the auto-complete suggestions (using Shift + Tab), to discover the built-in checks.

• Checks in the built-in suites come with pre-defined conditions, and when building your custom suite you should choose which conditions to add.

• Most check classes have built-in methods for adding conditions. These apply to the naming convention add_condition_..., which enables adding a condition logic to parse the check’s results.

• Each check instance can have several conditions or none. Each condition will be evaluated separately.

• The pass (✓) / fail (✖) / insight (!) status of the conditions, along with the condition’s name and extra info will be displayed in the suite’s Conditions Summary.

• Most conditions have configurable arguments that can be passed to the condition while adding it.

## Run the Suite#

This is simply done by calling the run() method of the suite.

To see that in action, we’ll need datasets and a model.

Let’s quickly load a dataset and train a simple model for the sake of this demo

### Load Datasets and Train a Simple Model#

import numpy as np
# General imports
import pandas as pd

np.random.seed(22)

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

from deepchecks.tabular.datasets.classification import iris

label_col = 'target'

# Train Model
rf_clf = RandomForestClassifier()
rf_clf.fit(train_dataset.data[train_dataset.features],
train_dataset.data[train_dataset.label_name]);

RandomForestClassifier()


### Run Suite#

new_custom_suite.run(model=rf_clf, train_dataset=train_dataset, test_dataset=test_dataset)

Simple Suite For Model Performance:
|     | 0/4 [Time: 00:00]
Simple Suite For Model Performance:
|##5  | 2/4 [Time: 00:00, Check=Train Test Performance]Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use zero_division parameter to control this behavior.

Simple Suite For Model Performance:
|#####| 4/4 [Time: 00:00, Check=Simple Model Comparison]

Simple Suite For Model Performance

## Modify an Existing Suite#

from deepchecks.tabular.suites import train_test_validation

customized_suite = train_test_validation()

# let's check what it has:
customized_suite

Train Test Validation Suite: [
0: DatasetsSizeComparison
Conditions:
0: Test-Train size ratio is greater than 0.01
1: NewLabelTrainTest
Conditions:
0: Number of new label values is less or equal to 0
2: CategoryMismatchTrainTest
Conditions:
0: Ratio of samples with a new category is less or equal to 0%
3: StringMismatchComparison
Conditions:
0: No new variants allowed in test data
4: DateTrainTestLeakageDuplicates
Conditions:
0: Date leakage ratio is less or equal to 0%
5: DateTrainTestLeakageOverlap
Conditions:
0: Date leakage ratio is less or equal to 0%
6: IndexTrainTestLeakage
Conditions:
0: Ratio of leaking indices is less or equal to 0%
7: TrainTestSamplesMix
Conditions:
0: Percentage of test data samples that appear in train data is less or equal to 10%
8: FeatureLabelCorrelationChange(ppscore_params={}, random_state=42)
Conditions:
0: Train-Test features' Predictive Power Score difference is less than 0.2
1: Train features' Predictive Power Score is less than 0.7
9: TrainTestFeatureDrift
Conditions:
0: categorical drift score < 0.2 and numerical drift score < 0.1
10: TrainTestLabelDrift
Conditions:
0: categorical drift score < 0.2 and numerical drift score < 0.1 for label drift
11: WholeDatasetDrift
Conditions:
0: Drift value is less than 0.25
]

# and modify it by removing a check by index:
customized_suite.remove(1)

Train Test Validation Suite: [
0: DatasetsSizeComparison
Conditions:
0: Test-Train size ratio is greater than 0.01
2: CategoryMismatchTrainTest
Conditions:
0: Ratio of samples with a new category is less or equal to 0%
3: StringMismatchComparison
Conditions:
0: No new variants allowed in test data
4: DateTrainTestLeakageDuplicates
Conditions:
0: Date leakage ratio is less or equal to 0%
5: DateTrainTestLeakageOverlap
Conditions:
0: Date leakage ratio is less or equal to 0%
6: IndexTrainTestLeakage
Conditions:
0: Ratio of leaking indices is less or equal to 0%
7: TrainTestSamplesMix
Conditions:
0: Percentage of test data samples that appear in train data is less or equal to 10%
8: FeatureLabelCorrelationChange(ppscore_params={}, random_state=42)
Conditions:
0: Train-Test features' Predictive Power Score difference is less than 0.2
1: Train features' Predictive Power Score is less than 0.7
9: TrainTestFeatureDrift
Conditions:
0: categorical drift score < 0.2 and numerical drift score < 0.1
10: TrainTestLabelDrift
Conditions:
0: categorical drift score < 0.2 and numerical drift score < 0.1 for label drift
11: WholeDatasetDrift
Conditions:
0: Drift value is less than 0.25
]

from deepchecks.tabular.checks import UnusedFeatures

# and add a new check with a condition:

Train Test Validation Suite: [
0: DatasetsSizeComparison
Conditions:
0: Test-Train size ratio is greater than 0.01
2: CategoryMismatchTrainTest
Conditions:
0: Ratio of samples with a new category is less or equal to 0%
3: StringMismatchComparison
Conditions:
0: No new variants allowed in test data
4: DateTrainTestLeakageDuplicates
Conditions:
0: Date leakage ratio is less or equal to 0%
5: DateTrainTestLeakageOverlap
Conditions:
0: Date leakage ratio is less or equal to 0%
6: IndexTrainTestLeakage
Conditions:
0: Ratio of leaking indices is less or equal to 0%
7: TrainTestSamplesMix
Conditions:
0: Percentage of test data samples that appear in train data is less or equal to 10%
8: FeatureLabelCorrelationChange(ppscore_params={}, random_state=42)
Conditions:
0: Train-Test features' Predictive Power Score difference is less than 0.2
1: Train features' Predictive Power Score is less than 0.7
9: TrainTestFeatureDrift
Conditions:
0: categorical drift score < 0.2 and numerical drift score < 0.1
10: TrainTestLabelDrift
Conditions:
0: categorical drift score < 0.2 and numerical drift score < 0.1 for label drift
11: WholeDatasetDrift
Conditions:
0: Drift value is less than 0.25
12: UnusedFeatures
Conditions:
0: Number of high variance unused features is less or equal to 5
]

# lets remove all condition for the FeatureLabelCorrelationChange:
customized_suite[3].clean_conditions()

# and update the suite's name:
customized_suite.name = 'New Data Leakage Suite'

# and now we can run our modified suite:
customized_suite.run(train_dataset, test_dataset, rf_clf)

New Data Leakage Suite:
|            | 0/12 [Time: 00:00]
New Data Leakage Suite:
|#########   | 9/12 [Time: 00:00, Check=Train Test Feature Drift]

New Data Leakage Suite

Total running time of the script: ( 0 minutes 2.884 seconds)

Gallery generated by Sphinx-Gallery