Note
Click here to download the full example code
Quickstart in 5 minutes#
In order to run your first Deepchecks Suite all you need to have is the data and model that you wish to validate. More specifically, you need:
Your train and test data (in Pandas DataFrames or Numpy Arrays)
(optional) A supported model (including XGBoost, scikit-learn models, and many more). Required for running checks that need the model’s predictions for running.
To run your first suite on your data and model, you need only a few lines of code, that start here: Define a Dataset Object.
# If you don’t have deepchecks installed yet:
# If you don't have deepchecks installed yet:
import sys
!{sys.executable} -m pip install deepchecks -U --quiet #--user
Load Data, Split Train-Val, and Train a Simple Model#
For the purpose of this guide we’ll use the simple iris dataset and train a simple random forest model for multiclass classification:
import numpy as np
# General imports
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from deepchecks.tabular.datasets.classification import iris
# Load Data
iris_df = iris.load_data(data_format='Dataframe', as_train_test=False)
label_col = 'target'
df_train, df_test = train_test_split(iris_df, stratify=iris_df[label_col], random_state=0)
# Train Model
rf_clf = RandomForestClassifier(random_state=0)
rf_clf.fit(df_train.drop(label_col, axis=1), df_train[label_col]);
Out:
RandomForestClassifier(random_state=0)
Define a Dataset Object#
Initialize the Dataset object, stating the relevant metadata about the dataset (e.g. the name for the label column)
Check out the Dataset’s attributes to see which additional special columns can be declared and used (e.g. date column, index column).
from deepchecks.tabular import Dataset
# We explicitly state that this dataset has no categorical features, otherwise they will be automatically inferred
# If the dataset has categorical features, the best practice is to pass a list with their names
ds_train = Dataset(df_train, label=label_col, cat_features=[])
ds_test = Dataset(df_test, label=label_col, cat_features=[])
Run a Deepchecks Suite#
Run the full suite#
Use the full_suite
that is a collection of (most of) the prebuilt checks.
Check out the when should you use deepchecks guide for some more info about the existing suites and when to use them.
from deepchecks.tabular.suites import full_suite
suite = full_suite()
suite.run(train_dataset=ds_train, test_dataset=ds_test, model=rf_clf)
Out:
Full Suite: 0%| | 0/36 [00:00<?, ? Check/s]
Full Suite: 0%| | 0/36 [00:00<?, ? Check/s, Check=Model Info]
Full Suite: 3%|# | 1/36 [00:00<00:00, 51.44 Check/s, Check=Columns Info]
Full Suite: 6%|## | 2/36 [00:00<00:00, 35.86 Check/s, Check=Confusion Matrix Report]
Full Suite: 8%|### | 3/36 [00:00<00:00, 34.66 Check/s, Check=Performance Report]
Full Suite: 11%|#### | 4/36 [00:00<00:02, 14.42 Check/s, Check=Performance Report]
Full Suite: 11%|#### | 4/36 [00:00<00:02, 14.42 Check/s, Check=Roc Report]
Full Suite: 14%|##### | 5/36 [00:00<00:02, 14.42 Check/s, Check=Simple Model Comparison]
Full Suite: 17%|###### | 6/36 [00:00<00:01, 15.77 Check/s, Check=Simple Model Comparison]
Full Suite: 17%|###### | 6/36 [00:00<00:01, 15.77 Check/s, Check=Model Error Analysis]
Full Suite: 19%|####### | 7/36 [00:00<00:01, 15.77 Check/s, Check=Calibration Score]
Full Suite: 22%|######## | 8/36 [00:00<00:02, 12.89 Check/s, Check=Calibration Score]
Full Suite: 22%|######## | 8/36 [00:00<00:02, 12.89 Check/s, Check=Regression Systematic Error]
Full Suite: 25%|######### | 9/36 [00:00<00:02, 12.89 Check/s, Check=Regression Error Distribution]
Full Suite: 28%|########## | 10/36 [00:00<00:02, 12.89 Check/s, Check=Boosting Overfit]
Full Suite: 31%|########### | 11/36 [00:00<00:01, 12.89 Check/s, Check=Unused Features]
Full Suite: 33%|############ | 12/36 [00:00<00:01, 12.89 Check/s, Check=Model Inference Time]
Full Suite: 36%|############# | 13/36 [00:00<00:01, 12.89 Check/s, Check=Train Test Feature Drift]
Full Suite: 39%|############## | 14/36 [00:00<00:01, 20.80 Check/s, Check=Train Test Feature Drift]
Full Suite: 39%|############## | 14/36 [00:00<00:01, 20.80 Check/s, Check=Train Test Label Drift]
Full Suite: 42%|############### | 15/36 [00:00<00:01, 20.80 Check/s, Check=Whole Dataset Drift] Calculating permutation feature importance. Expected to finish in 1 seconds
Full Suite: 44%|################ | 16/36 [00:00<00:00, 20.80 Check/s, Check=Dominant Frequency Change]
Full Suite: 47%|################# | 17/36 [00:00<00:00, 22.63 Check/s, Check=Dominant Frequency Change]
Full Suite: 47%|################# | 17/36 [00:00<00:00, 22.63 Check/s, Check=Category Mismatch Train Test]
Full Suite: 50%|################## | 18/36 [00:00<00:00, 22.63 Check/s, Check=New Label Train Test]
Full Suite: 53%|################### | 19/36 [00:00<00:00, 22.63 Check/s, Check=String Mismatch Comparison]
Full Suite: 56%|#################### | 20/36 [00:00<00:00, 22.63 Check/s, Check=Datasets Size Comparison]
Full Suite: 58%|##################### | 21/36 [00:00<00:00, 22.63 Check/s, Check=Date Train Test Leakage Duplicates]
Full Suite: 61%|###################### | 22/36 [00:00<00:00, 22.63 Check/s, Check=Date Train Test Leakage Overlap]
Full Suite: 64%|####################### | 23/36 [00:00<00:00, 22.63 Check/s, Check=Single Feature Contribution Train Test]
Full Suite: 67%|######################## | 24/36 [00:00<00:00, 22.63 Check/s, Check=Train Test Samples Mix]
Full Suite: 69%|######################### | 25/36 [00:00<00:00, 22.63 Check/s, Check=Identifier Leakage]
Full Suite: 72%|########################## | 26/36 [00:00<00:00, 22.63 Check/s, Check=Index Train Test Leakage]
Full Suite: 75%|########################### | 27/36 [00:00<00:00, 22.63 Check/s, Check=Is Single Value]
Full Suite: 78%|############################ | 28/36 [00:00<00:00, 22.63 Check/s, Check=Mixed Nulls]
Full Suite: 81%|############################# | 29/36 [00:00<00:00, 46.21 Check/s, Check=Mixed Nulls]
Full Suite: 81%|############################# | 29/36 [00:00<00:00, 46.21 Check/s, Check=Mixed Data Types]
Full Suite: 83%|############################## | 30/36 [00:00<00:00, 46.21 Check/s, Check=String Mismatch]
Full Suite: 86%|############################### | 31/36 [00:00<00:00, 46.21 Check/s, Check=Data Duplicates]
Full Suite: 89%|################################ | 32/36 [00:00<00:00, 46.21 Check/s, Check=String Length Out Of Bounds]
Full Suite: 92%|################################# | 33/36 [00:00<00:00, 46.21 Check/s, Check=Special Characters]
Full Suite: 94%|################################## | 34/36 [00:00<00:00, 46.21 Check/s, Check=Conflicting Labels]
Full Suite: 97%|################################### | 35/36 [00:01<00:00, 46.21 Check/s, Check=Outlier Sample Detection]
Run the integrity suite#
If you still haven’t started modeling and just have a single dataset, you
can use the single_dataset_integrity
:
from deepchecks.tabular.suites import single_dataset_integrity
integ_suite = single_dataset_integrity()
integ_suite.run(ds_train)
Out:
Single Dataset Integrity Suite: 0%| | 0/9 [00:00<?, ? Check/s]
Single Dataset Integrity Suite: 0%| | 0/9 [00:00<?, ? Check/s, Check=Is Single Value]
Single Dataset Integrity Suite: 11%|# | 1/9 [00:00<00:00, 923.45 Check/s, Check=Mixed Nulls]
Single Dataset Integrity Suite: 22%|## | 2/9 [00:00<00:00, 676.83 Check/s, Check=Mixed Data Types]
Single Dataset Integrity Suite: 33%|### | 3/9 [00:00<00:00, 835.24 Check/s, Check=String Mismatch]
Single Dataset Integrity Suite: 44%|#### | 4/9 [00:00<00:00, 763.36 Check/s, Check=Data Duplicates]
Single Dataset Integrity Suite: 56%|##### | 5/9 [00:00<00:00, 656.94 Check/s, Check=String Length Out Of Bounds]
Single Dataset Integrity Suite: 67%|###### | 6/9 [00:00<00:00, 557.12 Check/s, Check=Special Characters]
Single Dataset Integrity Suite: 78%|####### | 7/9 [00:00<00:00, 565.49 Check/s, Check=Conflicting Labels]
Single Dataset Integrity Suite: 89%|######## | 8/9 [00:00<00:00, 420.71 Check/s, Check=Outlier Sample Detection]
Run a Deepchecks Check#
If you want to run a specific check, you can just import it and run it directly.
Check out the Check tabular examples in the examples or the API Reference for more info about the existing checks and their parameters.
from deepchecks.tabular.checks import TrainTestLabelDrift
check = TrainTestLabelDrift()
result = check.run(ds_train, ds_test)
result
and also inspect the result value which has a check-dependant structure:
result.value
Out:
{'Drift score': 0.002507306267756066, 'Method': 'PSI'}
Edit an Existing Suite#
Inspect suite and remove condition#
We can see that the single feature contribution failed, both for test and for train. Since this is a very simple dataset with few features and this behavior is not necessarily problematic, we will remove the existing conditions for the PPS
# Lets first print the suite to find the conditions that we want to change:
suite
Out:
Full Suite: [
0: ModelInfo
1: ColumnsInfo
2: ConfusionMatrixReport
3: PerformanceReport
Conditions:
0: Train-Test scores relative degradation is not greater than 0.1
4: RocReport(excluded_classes=[])
Conditions:
0: AUC score for all the classes is not less than 0.7
5: SimpleModelComparison
Conditions:
0: Model performance gain over simple model is not less than 10%
6: ModelErrorAnalysis
Conditions:
0: The performance difference of the detected segments must not be greater than 5%
7: CalibrationScore
8: RegressionSystematicError
Conditions:
0: Bias ratio is not greater than 0.01
9: RegressionErrorDistribution
Conditions:
0: Kurtosis value is not less than -0.1
10: BoostingOverfit
Conditions:
0: Test score over iterations doesn't decline by more than 5% from the best score
11: UnusedFeatures
Conditions:
0: Number of high variance unused features is not greater than 5
12: ModelInferenceTime
Conditions:
0: Average model inference time for one sample is not greater than 0.001
13: TrainTestFeatureDrift
Conditions:
0: PSI <= 0.2 and Earth Mover's Distance <= 0.1
14: TrainTestLabelDrift
Conditions:
0: PSI <= 0.2 and Earth Mover's Distance <= 0.1 for label drift
15: WholeDatasetDrift
Conditions:
0: Drift value is not greater than 0.25
16: DominantFrequencyChange
Conditions:
0: Change in ratio of dominant value in data is not greater than 25%
17: CategoryMismatchTrainTest
Conditions:
0: Ratio of samples with a new category is not greater than 0%
18: NewLabelTrainTest
Conditions:
0: Number of new label values is not greater than 0
19: StringMismatchComparison
Conditions:
0: No new variants allowed in test data
20: DatasetsSizeComparison
Conditions:
0: Test-Train size ratio is not smaller than 0.01
21: DateTrainTestLeakageDuplicates
Conditions:
0: Date leakage ratio is not greater than 0%
22: DateTrainTestLeakageOverlap
Conditions:
0: Date leakage ratio is not greater than 0%
23: SingleFeatureContributionTrainTest(ppscore_params={})
Conditions:
0: Train-Test features' Predictive Power Score difference is not greater than 0.2
1: Train features' Predictive Power Score is not greater than 0.7
24: TrainTestSamplesMix
Conditions:
0: Percentage of test data samples that appear in train data not greater than 10%
25: IdentifierLeakage(ppscore_params={})
Conditions:
0: Identifier columns PPS is not greater than 0
26: IndexTrainTestLeakage
Conditions:
0: Ratio of leaking indices is not greater than 0%
27: IsSingleValue
Conditions:
0: Does not contain only a single value
28: MixedNulls
Conditions:
0: Not more than 1 different null types
29: MixedDataTypes
Conditions:
0: Rare data types in column are either more than 10% or less than 1% of the data
30: StringMismatch
Conditions:
0: No string variants
31: DataDuplicates
Conditions:
0: Duplicate data ratio is not greater than 0%
32: StringLengthOutOfBounds
Conditions:
0: Ratio of outliers not greater than 0% string length outliers
33: SpecialCharacters
Conditions:
0: Ratio of entirely special character samples not greater than 0.1%
34: ConflictingLabels
Conditions:
0: Ambiguous sample ratio is not greater than 0%
35: OutlierSampleDetection
]
# now we can use the check's index and the condition's number to remove it:
print(suite[6])
suite[6].remove_condition(0)
Out:
ModelErrorAnalysis
Conditions:
0: The performance difference of the detected segments must not be greater than 5%
# print and see that the condition was removed
suite[6]
Out:
ModelErrorAnalysis
If we now re-run the suite, all of the existing conditions will pass.
Note: the check we manipulated will still run as part of the Suite, however it won’t appear in the Conditions Summary since it no longer has any conditions defined on it. You can still see its display results in the Additional Outputs section
For more info about working with conditions, see the detailed configuring conditions guide.
Total running time of the script: ( 0 minutes 2.264 seconds)