Note

Go to the end to download the full example code

Full Suite Quickstart#

In order to run your first Deepchecks Suite all you need to have is the data and model that you wish to validate. More specifically, you need:

Your train and test data (in Pandas DataFrames or Numpy Arrays)
(optional) A Working with Models and Predictions (including XGBoost, scikit-learn models, and many more). Required for running checks that need the model’s predictions for running.

To run your first suite on your data and model, you need only a few lines of code, that start here: Define a Dataset Object.

# If you don’t have deepchecks installed yet:

# If you don't have deepchecks installed yet:
import sys
!{sys.executable} -m pip install deepchecks -U --quiet #--user

Load Data, Split Train-Val, and Train a Simple Model#

For the purpose of this guide we’ll use the simple iris dataset and train a simple random forest model for multiclass classification:

import numpy as np
# General imports
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

from deepchecks.tabular.datasets.classification import iris

# Load Data
iris_df = iris.load_data(data_format='Dataframe', as_train_test=False)
label_col = 'target'
df_train, df_test = train_test_split(iris_df, stratify=iris_df[label_col], random_state=0)

# Train Model
rf_clf = RandomForestClassifier(random_state=0)
rf_clf.fit(df_train.drop(label_col, axis=1), df_train[label_col]);

RandomForestClassifier(random_state=0)

Define a Dataset Object#

Initialize the Dataset object, stating the relevant metadata about the dataset (e.g. the name for the label column)

Check out the Dataset’s attributes to see which additional special columns can be declared and used (e.g. date column, index column).

from deepchecks.tabular import Dataset

# We explicitly state that this dataset has no categorical features, otherwise they will be automatically inferred
# If the dataset has categorical features, the best practice is to pass a list with their names

ds_train = Dataset(df_train, label=label_col, cat_features=[])
ds_test =  Dataset(df_test,  label=label_col, cat_features=[])

Run a Deepchecks Suite#

Run the full suite#

Use the full_suite that is a collection of (most of) the prebuilt checks.

Check out the when you should use deepchecks guide for some more info about the existing suites and when to use them.

from deepchecks.tabular.suites import full_suite

suite = full_suite()

suite.run(train_dataset=ds_train, test_dataset=ds_test, model=rf_clf)

Full Suite:
|                                   | 0/35 [Time: 00:00]
Full Suite:
|█                                  | 1/35 [Time: 00:00, Check=Train Test Performance]
Full Suite:
|█████                              | 5/35 [Time: 00:00, Check=Simple Model Comparison]
Full Suite:
|███████                            | 7/35 [Time: 00:01, Check=Calibration Score]
Full Suite:
|████████████████████               | 20/35 [Time: 00:01, Check=Feature Label Correlation Change]
Full Suite:
|████████████████████████           | 24/35 [Time: 00:01, Check=Is Single Value]
Full Suite:
|█████████████████████████████████  | 33/35 [Time: 00:01, Check=Feature Label Correlation]

Full Suite

Status	Check	Condition	More Info
✖	Train Test Performance	Train-Test scores relative degradation is less than 0.1	2 scores failed. Found max degradation of 16.67% for metric Recall and class 2.
✖	Feature Label Correlation Change	Train features' Predictive Power Score is less than 0.7	Found 2 out of 4 features in train dataset with PPS above threshold: {'petal width (cm)': '0.93', 'petal length (cm)': '0.86'}
✖	Feature Label Correlation - Train Dataset	Features' Predictive Power Score is less than 0.8	Found 2 out of 4 features with PPS above threshold: {'petal width (cm)': '0.93', 'petal length (cm)': '0.86'}
✖	Feature Label Correlation - Test Dataset	Features' Predictive Power Score is less than 0.8	Found 2 out of 4 features with PPS above threshold: {'petal length (cm)': '1', 'petal width (cm)': '0.9'}
✖	Feature-Feature Correlation - Train Dataset	Not more than 0 pairs are correlated above 0.9	Correlation is greater than 0.9 for pairs [('petal length (cm)', 'sepal length (cm)'), ('petal length (cm)', 'petal width (cm)')]
✖	Feature-Feature Correlation - Test Dataset	Not more than 0 pairs are correlated above 0.9	Correlation is greater than 0.9 for pairs [('petal length (cm)', 'petal width (cm)')]
!	Weak Segments Performance - Test Dataset	The relative performance of weakest segment is greater than 80% of average model performance.	Found a segment with accuracy score of 0.667 in comparison to an average score of 0.947 in sampled data.
✓	Feature Label Correlation Change	Train-Test features' Predictive Power Score difference is less than 0.2	Passed for 4 relevant columns

Conditions Summary

Status	Condition	More Info
✖	Train-Test scores relative degradation is less than 0.1	2 scores failed. Found max degradation of 16.67% for metric Recall and class 2.

Conditions Summary

Status	Condition	More Info
✖	Train features' Predictive Power Score is less than 0.7	Found 2 out of 4 features in train dataset with PPS above threshold: {'petal width (cm)': '0.93', 'petal length (cm)': '0.86'}
✓	Train-Test features' Predictive Power Score difference is less than 0.2	Passed for 4 relevant columns

Conditions Summary

Status	Condition	More Info
✖	Features' Predictive Power Score is less than 0.8	Found 2 out of 4 features with PPS above threshold: {'petal width (cm)': '0.93', 'petal length (cm)': '0.86'}

Conditions Summary

Status	Condition	More Info
✖	Features' Predictive Power Score is less than 0.8	Found 2 out of 4 features with PPS above threshold: {'petal length (cm)': '1', 'petal width (cm)': '0.9'}

Conditions Summary

Status	Condition	More Info
✖	Not more than 0 pairs are correlated above 0.9	Correlation is greater than 0.9 for pairs [('petal length (cm)', 'sepal length (cm)'), ('petal length (cm)', 'petal width (cm)')]

Conditions Summary

Status	Condition	More Info
✖	Not more than 0 pairs are correlated above 0.9	Correlation is greater than 0.9 for pairs [('petal length (cm)', 'petal width (cm)')]

Conditions Summary

Status	Condition	More Info
!	The relative performance of weakest segment is greater than 80% of average model performance.	Found a segment with accuracy score of 0.667 in comparison to an average score of 0.947 in sampled data.

Status	Check	Condition	More Info
✓	ROC Report - Train Dataset	AUC score for all the classes is greater than 0.7	All classes passed, minimum AUC found is 1 for class 0
✓	Label Drift	Label drift score < 0.15	Label's drift score Cramer's V is 0
✓	Feature Drift	categorical drift score < 0.2 and numerical drift score < 0.2	Passed for 4 columns out of 4 columns. Found column "sepal width (cm)" has the highest numerical drift score: 0.14
✓	Train Test Samples Mix	Percentage of test data samples that appear in train data is less or equal to 5%	Percent of test data samples that appear in train data: 2.63%
✓	Datasets Size Comparison	Test-Train size ratio is greater than 0.01	Test-Train size ratio is 0.34
✓	Model Inference Time - Test Dataset	Average model inference time for one sample is less than 0.001	Found average inference time (seconds): 0.00011958
✓	New Category Train Test	Ratio of samples with a new category is less or equal to 0%	No relevant features to check were found
✓	Unused Features - Test Dataset	Number of high variance unused features is less or equal to 5	Found 1 high variance unused features
✓	Unused Features - Train Dataset	Number of high variance unused features is less or equal to 5	Found 1 high variance unused features
✓	Simple Model Comparison	Model performance gain over simple model is greater than 10%	All classes passed, average gain for metrics: {'F1': '87.69%'}
✓	Prediction Drift	Prediction drift score < 0.15	Found model prediction Cramer's V drift score of 0
✓	ROC Report - Test Dataset	AUC score for all the classes is greater than 0.7	All classes passed, minimum AUC found is 1 for class 0
✓	Model Inference Time - Train Dataset	Average model inference time for one sample is less than 0.001	Found average inference time (seconds): 4.511e-05
✓	String Length Out Of Bounds - Test Dataset	Ratio of string length outliers is less or equal to 0%	No relevant columns to check were found
✓	Mixed Data Types - Test Dataset	Rare data types in column are either more than 10% or less than 1% of the data	5 columns passed: found 0 columns with negligible types mix, and 5 columns without any types mix
✓	String Length Out Of Bounds - Train Dataset	Ratio of string length outliers is less or equal to 0%	No relevant columns to check were found
✓	Data Duplicates - Test Dataset	Duplicate data ratio is less or equal to 5%	Found 0% duplicate data
✓	Data Duplicates - Train Dataset	Duplicate data ratio is less or equal to 5%	Found 0% duplicate data
✓	String Mismatch - Test Dataset	No string variants	Passed for 1 relevant column
✓	String Mismatch - Train Dataset	No string variants	Passed for 1 relevant column
✓	Mixed Data Types - Train Dataset	Rare data types in column are either more than 10% or less than 1% of the data	5 columns passed: found 0 columns with negligible types mix, and 5 columns without any types mix
✓	Multivariate Drift	Drift value is less than 0.25	Found drift value of: 0, corresponding to a domain classifier AUC of: 0.45
✓	Mixed Nulls - Train Dataset	Number of different null types is less or equal to 1	Passed for 5 relevant columns
✓	Special Characters - Test Dataset	Ratio of samples containing solely special character is less or equal to 0.1%	Passed for 5 relevant columns
✓	Special Characters - Train Dataset	Ratio of samples containing solely special character is less or equal to 0.1%	Passed for 5 relevant columns
✓	Single Value in Column - Test Dataset	Does not contain only a single value	Passed for 5 relevant columns
✓	Single Value in Column - Train Dataset	Does not contain only a single value	Passed for 5 relevant columns
✓	Conflicting Labels - Train Dataset	Ambiguous sample ratio is less or equal to 0%	Ratio of samples with conflicting labels: 0%
✓	String Mismatch Comparison	No new variants allowed in test data	No relevant columns to check were found
✓	New Label Train Test	Number of new label values is less or equal to 0	Found 0 new labels in test data: []
✓	Mixed Nulls - Test Dataset	Number of different null types is less or equal to 1	Passed for 5 relevant columns
✓	Conflicting Labels - Test Dataset	Ambiguous sample ratio is less or equal to 0%	Ratio of samples with conflicting labels: 0%

Conditions Summary

Status	Condition	More Info
✓	AUC score for all the classes is greater than 0.7	All classes passed, minimum AUC found is 1 for class 0

Conditions Summary

Status	Condition	More Info
✓	AUC score for all the classes is greater than 0.7	All classes passed, minimum AUC found is 1 for class 0

Conditions Summary

Status	Condition	More Info
✓	Prediction drift score < 0.15	Found model prediction Cramer's V drift score of 0

Conditions Summary

Status	Condition	More Info
✓	Model performance gain over simple model is greater than 10%	All classes passed, average gain for metrics: {'F1': '87.69%'}

Conditions Summary

Status	Condition	More Info
✓	Number of high variance unused features is less or equal to 5	Found 1 high variance unused features

Conditions Summary

Status	Condition	More Info
✓	Number of high variance unused features is less or equal to 5	Found 1 high variance unused features

Conditions Summary

Status	Condition	More Info
✓	Average model inference time for one sample is less than 0.001	Found average inference time (seconds): 4.511e-05

Conditions Summary

Status	Condition	More Info
✓	Average model inference time for one sample is less than 0.001	Found average inference time (seconds): 0.00011958

Conditions Summary

Status	Condition	More Info
✓	Test-Train size ratio is greater than 0.01	Test-Train size ratio is 0.34

	Train	Test
Size	112	38

Conditions Summary

Status	Condition	More Info
✓	Ratio of samples with a new category is less or equal to 0%	No relevant features to check were found

	# New Categories	Ratio of New Categories	Feature importance	New Categories Names
Feature Name

Conditions Summary

Status	Condition	More Info
✓	Percentage of test data samples that appear in train data is less or equal to 5%	Percent of test data samples that appear in train data: 2.63%

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)	target
Train indices: 101 Test indices: 142	5.80	2.70	5.10	1.90	2.00

Conditions Summary

Status	Condition	More Info
✓	categorical drift score < 0.2 and numerical drift score < 0.2	Passed for 4 columns out of 4 columns. Found column "sepal width (cm)" has the highest numerical drift score: 0.14

Conditions Summary

Status	Condition	More Info
✓	Label drift score < 0.15	Label's drift score Cramer's V is 0

Check	Summary
Confusion Matrix Report - Train Dataset	Calculate the confusion matrix of the model on the given dataset. Read More...
Confusion Matrix Report - Test Dataset	Calculate the confusion matrix of the model on the given dataset. Read More...
Calibration Metric - Train Dataset	Calculate the calibration curve with brier score for each class. Read More...
Calibration Metric - Test Dataset	Calculate the calibration curve with brier score for each class. Read More...
Outlier Sample Detection - Train Dataset	Detects outliers in a dataset using the LoOP algorithm. Read More...

	Outlier Probability Score	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)	target
41	0.89	4.50	2.30	1.30	0.30	0
106	0.72	4.90	2.50	4.50	1.70	2
56	0.57	6.30	3.30	4.70	1.60	1
114	0.56	5.80	2.80	5.10	2.40	2
22	0.56	4.60	3.60	1.00	0.20	0

Check	Reason
Weak Segments Performance - Train Dataset	DeepchecksProcessError: WeakSegmentsPerformance was unable to train an error model to find weak segments. Try increasing n_samples or supply additional features.
Regression Error Distribution - Train Dataset	Check is irrelevant for classification tasks
Regression Error Distribution - Test Dataset	Check is irrelevant for classification tasks
Boosting Overfit	Check is relevant for Boosting models of type ('AdaBoostClassifier', 'GradientBoostingClassifier', 'LGBMClassifier', 'XGBClassifier', 'CatBoostClassifier', 'AdaBoostRegressor', 'GradientBoostingRegressor', 'LGBMRegressor', 'XGBRegressor', 'CatBoostRegressor'), but received model of type RandomForestClassifier
Date Train Test Leakage Duplicates	DatasetValidationError: Dataset does not contain a datetime. see Dataset docs
Date Train Test Leakage Overlap	DatasetValidationError: Dataset does not contain a datetime. see Dataset docs
Index Train Test Leakage	DatasetValidationError: Dataset does not contain an index. see Dataset docs
Outlier Sample Detection - Test Dataset	There are not enough samples to run this check, found only 38 samples.
Identifier Label Correlation - Train Dataset	DatasetValidationError: Dataset does not contain an index or a datetime. see Dataset docs
Identifier Label Correlation - Test Dataset	DatasetValidationError: Dataset does not contain an index or a datetime. see Dataset docs

Run the integrity suite#

If you still haven’t started modeling and just have a single dataset, you can use the data_integrity:

from deepchecks.tabular.suites import data_integrity

integ_suite = data_integrity()
integ_suite.run(ds_train)

Data Integrity Suite:
|            | 0/12 [Time: 00:00]
Data Integrity Suite:
|███████████ | 11/12 [Time: 00:00, Check=Feature Feature Correlation]

Data Integrity Suite

Status	Check	Condition	More Info
✖	Feature Label Correlation	Features' Predictive Power Score is less than 0.8	Found 2 out of 4 features with PPS above threshold: {'petal width (cm)': '0.93', 'petal length (cm)': '0.86'}
✖	Feature-Feature Correlation	Not more than 0 pairs are correlated above 0.9	Correlation is greater than 0.9 for pairs [('petal length (cm)', 'sepal length (cm)'), ('petal length (cm)', 'petal width (cm)')]

Conditions Summary

Status	Condition	More Info
✖	Features' Predictive Power Score is less than 0.8	Found 2 out of 4 features with PPS above threshold: {'petal width (cm)': '0.93', 'petal length (cm)': '0.86'}

Conditions Summary

Status	Condition	More Info
✖	Not more than 0 pairs are correlated above 0.9	Correlation is greater than 0.9 for pairs [('petal length (cm)', 'sepal length (cm)'), ('petal length (cm)', 'petal width (cm)')]

Status	Check	Condition	More Info
✓	Single Value in Column	Does not contain only a single value	Passed for 5 relevant columns
✓	Special Characters	Ratio of samples containing solely special character is less or equal to 0.1%	Passed for 5 relevant columns
✓	Mixed Nulls	Number of different null types is less or equal to 1	Passed for 5 relevant columns
✓	Mixed Data Types	Rare data types in column are either more than 10% or less than 1% of the data	5 columns passed: found 0 columns with negligible types mix, and 5 columns without any types mix
✓	String Mismatch	No string variants	Passed for 1 relevant column
✓	Data Duplicates	Duplicate data ratio is less or equal to 5%	Found 0% duplicate data
✓	String Length Out Of Bounds	Ratio of string length outliers is less or equal to 0%	No relevant columns to check were found
✓	Conflicting Labels	Ambiguous sample ratio is less or equal to 0%	Ratio of samples with conflicting labels: 0%

Check	Summary
Outlier Sample Detection	Detects outliers in a dataset using the LoOP algorithm. Read More...

	Outlier Probability Score	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)	target
41	0.89	4.50	2.30	1.30	0.30	0
106	0.72	4.90	2.50	4.50	1.70	2
56	0.57	6.30	3.30	4.70	1.60	1
114	0.56	5.80	2.80	5.10	2.40	2
22	0.56	4.60	3.60	1.00	0.20	0

Check	Reason
Identifier Label Correlation - Train Dataset	DatasetValidationError: Dataset does not contain an index or a datetime. see Dataset docs

Run a Deepchecks Check#

If you want to run a specific check, you can just import it and run it directly.

Check out the Check Gallery or the API Reference for more info about the existing checks and their parameters.

from deepchecks.tabular.checks import LabelDrift

check = LabelDrift()
result = check.run(ds_train, ds_test)
result

Label Drift

and also inspect the result value which has a check-dependant structure:

result.value

{'Drift score': 0.0, 'Method': "Cramer's V"}

Edit an Existing Suite#

Inspect suite and remove condition#

We can see that the Feature Label Correlation check failed, both for test and for train. Since this is a very simple dataset with few features and this behavior is not necessarily problematic, we will remove the existing conditions for the PPS

# Lets first print the suite to find the conditions that we want to change:

suite

Full Suite: [
    0: TrainTestPerformance
            Conditions:
                    0: Train-Test scores relative degradation is less than 0.1
    1: RocReport
            Conditions:
                    0: AUC score for all the classes is greater than 0.7
    2: ConfusionMatrixReport
    3: PredictionDrift
            Conditions:
                    0: Prediction drift score < 0.15
    4: SimpleModelComparison
            Conditions:
                    0: Model performance gain over simple model is greater than 10%
    5: WeakSegmentsPerformance(n_to_show=5)
            Conditions:
                    0: The relative performance of weakest segment is greater than 80% of average model performance.
    6: CalibrationScore
    7: RegressionErrorDistribution
            Conditions:
                    0: Kurtosis value higher than -0.1
                    1: Systematic error ratio lower than 0.01
    8: UnusedFeatures
            Conditions:
                    0: Number of high variance unused features is less or equal to 5
    9: BoostingOverfit
            Conditions:
                    0: Test score over iterations is less than 5% from the best score
    10: ModelInferenceTime
            Conditions:
                    0: Average model inference time for one sample is less than 0.001
    11: DatasetsSizeComparison
            Conditions:
                    0: Test-Train size ratio is greater than 0.01
    12: NewLabelTrainTest
            Conditions:
                    0: Number of new label values is less or equal to 0
    13: NewCategoryTrainTest
            Conditions:
                    0: Ratio of samples with a new category is less or equal to 0%
    14: StringMismatchComparison
            Conditions:
                    0: No new variants allowed in test data
    15: DateTrainTestLeakageDuplicates
            Conditions:
                    0: Date leakage ratio is less or equal to 0%
    16: DateTrainTestLeakageOverlap
            Conditions:
                    0: Date leakage ratio is less or equal to 0%
    17: IndexTrainTestLeakage
            Conditions:
                    0: Ratio of leaking indices is less or equal to 0%
    18: TrainTestSamplesMix(n_to_show=5)
            Conditions:
                    0: Percentage of test data samples that appear in train data is less or equal to 5%
    19: FeatureLabelCorrelationChange(ppscore_params={}, random_state=42)
            Conditions:
                    0: Train-Test features' Predictive Power Score difference is less than 0.2
                    1: Train features' Predictive Power Score is less than 0.7
    20: FeatureDrift
            Conditions:
                    0: categorical drift score < 0.2 and numerical drift score < 0.2
    21: LabelDrift
            Conditions:
                    0: Label drift score < 0.15
    22: MultivariateDrift
            Conditions:
                    0: Drift value is less than 0.25
    23: IsSingleValue
            Conditions:
                    0: Does not contain only a single value
    24: SpecialCharacters
            Conditions:
                    0: Ratio of samples containing solely special character is less or equal to 0.1%
    25: MixedNulls
            Conditions:
                    0: Number of different null types is less or equal to 1
    26: MixedDataTypes
            Conditions:
                    0: Rare data types in column are either more than 10% or less than 1% of the data
    27: StringMismatch
            Conditions:
                    0: No string variants
    28: DataDuplicates
            Conditions:
                    0: Duplicate data ratio is less or equal to 5%
    29: StringLengthOutOfBounds
            Conditions:
                    0: Ratio of string length outliers is less or equal to 0%
    30: ConflictingLabels
            Conditions:
                    0: Ambiguous sample ratio is less or equal to 0%
    31: OutlierSampleDetection
    32: FeatureLabelCorrelation(ppscore_params={}, random_state=42)
            Conditions:
                    0: Features' Predictive Power Score is less than 0.8
    33: FeatureFeatureCorrelation
            Conditions:
                    0: Not more than 0 pairs are correlated above 0.9
    34: IdentifierLabelCorrelation(ppscore_params={})
            Conditions:
                    0: Identifier columns PPS is less or equal to 0
]

# now we can use the check's index and the condition's number to remove it:
print(suite[5])
suite[5].remove_condition(0)

WeakSegmentsPerformance(n_to_show=5)
        Conditions:
                0: The relative performance of weakest segment is greater than 80% of average model performance.

# print and see that the condition was removed
suite[5]

WeakSegmentsPerformance(n_to_show=5)

If we now re-run the suite, all of the existing conditions will pass.

Note: the check we manipulated will still run as part of the Suite, however it won’t appear in the Conditions Summary since it no longer has any conditions defined on it. You can still see its display results in the Additional Outputs section

For more info about working with conditions, see the detailed configuring conditions guide.

Total running time of the script: (0 minutes 3.255 seconds)

Gallery generated by Sphinx-Gallery

Quickstarts

Model Evaluation Suite Quickstart

Full Suite Quickstart#

Load Data, Split Train-Val, and Train a Simple Model#

Define a Dataset Object#

Run a Deepchecks Suite#

Run the full suite#

Full Suite

Train Test Performance

Conditions Summary

Additional Outputs

Feature Label Correlation Change

Conditions Summary

Additional Outputs

Feature Label Correlation - Train Dataset

Conditions Summary

Additional Outputs

Feature Label Correlation - Test Dataset

Conditions Summary

Additional Outputs

Feature-Feature Correlation - Train Dataset

Conditions Summary

Additional Outputs

Feature-Feature Correlation - Test Dataset

Conditions Summary

Additional Outputs

Weak Segments Performance - Test Dataset

Conditions Summary

Additional Outputs

ROC Report - Train Dataset

Conditions Summary

Additional Outputs

ROC Report - Test Dataset

Conditions Summary

Additional Outputs

Prediction Drift

Conditions Summary

Additional Outputs

Simple Model Comparison

Conditions Summary

Additional Outputs

Unused Features - Train Dataset

Conditions Summary

Additional Outputs

Unused Features - Test Dataset

Conditions Summary

Additional Outputs

Model Inference Time - Train Dataset

Conditions Summary

Additional Outputs

Model Inference Time - Test Dataset

Conditions Summary

Additional Outputs

Datasets Size Comparison

Conditions Summary

Additional Outputs

New Category Train Test

Conditions Summary

Additional Outputs

Train Test Samples Mix

Conditions Summary

Additional Outputs

Feature Drift

Conditions Summary

Additional Outputs

Label Drift

Conditions Summary

Additional Outputs

Confusion Matrix Report - Train Dataset

Additional Outputs

Confusion Matrix Report - Test Dataset

Additional Outputs

Calibration Metric - Train Dataset

Additional Outputs

Calibration Metric - Test Dataset

Additional Outputs

Outlier Sample Detection - Train Dataset

Additional Outputs

Run the integrity suite#

Data Integrity Suite

Feature Label Correlation

Conditions Summary