Note

Click here to download the full example code

Export Suite Output to a HTML Report#

In this guide, we will demonstrate how to export a suite’s output as an HTML report. This enables easily sharing the results easier and also using deepchecks outside of the notebook environment.

Structure:

Save Suite Result to an HTML Report

Load Data#

Let’s fetch the iris train and test datasets

from deepchecks.tabular.datasets.classification import iris

train_dataset, test_dataset = iris.load_data()

Run Suite#

from deepchecks.tabular.suites import full_suite

suite = full_suite()

suite_result = suite.run(train_dataset=train_dataset, test_dataset=test_dataset)

Out:

Full Suite:   0%|                                    | 0/36 [00:00<?, ? Check/s]
Full Suite:   0%|                                    | 0/36 [00:00<?, ? Check/s, Check=Model Info]
Full Suite:   3%|#                                   | 1/36 [00:00<00:00, 5675.65 Check/s, Check=Columns Info]
Full Suite:   6%|##                                  | 2/36 [00:00<00:00, 1435.42 Check/s, Check=Confusion Matrix Report]
Full Suite:   8%|###                                 | 3/36 [00:00<00:00, 1172.69 Check/s, Check=Performance Report]
Full Suite:  11%|####                                | 4/36 [00:00<00:00, 1460.54 Check/s, Check=Roc Report]
Full Suite:  14%|#####                               | 5/36 [00:00<00:00, 1342.95 Check/s, Check=Simple Model Comparison]
Full Suite:  17%|######                              | 6/36 [00:00<00:00, 1524.46 Check/s, Check=Model Error Analysis]
Full Suite:  19%|#######                             | 7/36 [00:00<00:00, 1719.08 Check/s, Check=Calibration Score]
Full Suite:  22%|########                            | 8/36 [00:00<00:00, 1578.44 Check/s, Check=Regression Systematic Error]
Full Suite:  25%|#########                           | 9/36 [00:00<00:00, 1710.87 Check/s, Check=Regression Error Distribution]
Full Suite:  28%|##########                          | 10/36 [00:00<00:00, 1852.20 Check/s, Check=Boosting Overfit]
Full Suite:  31%|###########                         | 11/36 [00:00<00:00, 1993.49 Check/s, Check=Unused Features]
Full Suite:  33%|############                        | 12/36 [00:00<00:00, 2128.64 Check/s, Check=Model Inference Time]
Full Suite:  36%|#############                       | 13/36 [00:00<00:00, 2257.06 Check/s, Check=Train Test Feature Drift]
Full Suite:  39%|##############                      | 14/36 [00:00<00:00, 97.01 Check/s, Check=Train Test Feature Drift]
Full Suite:  39%|##############                      | 14/36 [00:00<00:00, 97.01 Check/s, Check=Train Test Label Drift]
Full Suite:  42%|###############                     | 15/36 [00:00<00:00, 97.01 Check/s, Check=Whole Dataset Drift]   Calculating permutation feature importance. Expected to finish in 1 seconds

Full Suite:  44%|################                    | 16/36 [00:00<00:00, 97.01 Check/s, Check=Dominant Frequency Change]
Full Suite:  47%|#################                   | 17/36 [00:00<00:00, 97.01 Check/s, Check=Category Mismatch Train Test]
Full Suite:  50%|##################                  | 18/36 [00:00<00:00, 97.01 Check/s, Check=New Label Train Test]
Full Suite:  53%|###################                 | 19/36 [00:00<00:00, 97.01 Check/s, Check=String Mismatch Comparison]
Full Suite:  56%|####################                | 20/36 [00:00<00:00, 97.01 Check/s, Check=Datasets Size Comparison]
Full Suite:  58%|#####################               | 21/36 [00:00<00:00, 97.01 Check/s, Check=Date Train Test Leakage Duplicates]
Full Suite:  61%|######################              | 22/36 [00:00<00:00, 97.01 Check/s, Check=Date Train Test Leakage Overlap]
Full Suite:  64%|#######################             | 23/36 [00:00<00:00, 97.01 Check/s, Check=Single Feature Contribution Train Test]
Full Suite:  67%|########################            | 24/36 [00:00<00:00, 65.91 Check/s, Check=Single Feature Contribution Train Test]
Full Suite:  67%|########################            | 24/36 [00:00<00:00, 65.91 Check/s, Check=Train Test Samples Mix]
Full Suite:  69%|#########################           | 25/36 [00:00<00:00, 65.91 Check/s, Check=Identifier Leakage]
Full Suite:  72%|##########################          | 26/36 [00:00<00:00, 65.91 Check/s, Check=Index Train Test Leakage]
Full Suite:  75%|###########################         | 27/36 [00:00<00:00, 65.91 Check/s, Check=Is Single Value]
Full Suite:  78%|############################        | 28/36 [00:00<00:00, 65.91 Check/s, Check=Mixed Nulls]
Full Suite:  81%|#############################       | 29/36 [00:00<00:00, 65.91 Check/s, Check=Mixed Data Types]
Full Suite:  83%|##############################      | 30/36 [00:00<00:00, 65.91 Check/s, Check=String Mismatch]
Full Suite:  86%|###############################     | 31/36 [00:00<00:00, 65.91 Check/s, Check=Data Duplicates]
Full Suite:  89%|################################    | 32/36 [00:00<00:00, 65.91 Check/s, Check=String Length Out Of Bounds]
Full Suite:  92%|#################################   | 33/36 [00:00<00:00, 65.91 Check/s, Check=Special Characters]
Full Suite:  94%|##################################  | 34/36 [00:00<00:00, 65.91 Check/s, Check=Conflicting Labels]
Full Suite:  97%|################################### | 35/36 [00:00<00:00, 65.91 Check/s, Check=Outlier Sample Detection]

Save Suite Result to an HTML Report#

Exporting the suite’s output to an HTML file is possible using the save_as_html function. This function expects a file-like object, whether it’s a file name or the full path to the destination folder.

suite_result.save_as_html('my_suite.html')

# or
suite_result.save_as_html() # will save the result in output.html

Out:

'output.html'

# Removing outputs created. this cell should be hidden in nbpshinx using "nbsphinx: hidden" in the metadata
import os

os.remove('output.html')
os.remove('my_suite.html')

Working with in-memory buffers

The suite output can also be written into a file buffers. This can be done by setting the file argument with a StringIO or BytesIO buffer object.

import io

html_out = io.StringIO()
suite_result.save_as_html(file=html_out)

View Suite Output#

The suite’s output can still be viewed within the notebook

suite_result

Full Suite

The suite is composed of various checks such as: Model Info, Date Train Test Leakage Overlap, Columns Info, etc...
Each check may contain conditions (which will result in pass / fail / warning / error , represented by ✓ / ✖ / ! / ⁈ ) as well as other outputs such as plots or tables.
Suites, checks and conditions can all be modified. Read more about custom suites.

Conditions Summary

Status	Check	Condition	More Info
✖	Single Feature Contribution Train-Test	Train features' Predictive Power Score is not greater than 0.7	Features in train dataset with PPS above threshold: {'petal width (cm)': '0.91', 'petal length (cm)': '0.86'}
✓	Train Test Drift	PSI <= 0.2 and Earth Mover's Distance <= 0.1
✓	Special Characters - Test Dataset	Ratio of entirely special character samples not greater than 0.1%
✓	Special Characters - Train Dataset	Ratio of entirely special character samples not greater than 0.1%
✓	String Length Out Of Bounds - Test Dataset	Ratio of outliers not greater than 0% string length outliers
✓	String Length Out Of Bounds - Train Dataset	Ratio of outliers not greater than 0% string length outliers
✓	Data Duplicates - Test Dataset	Duplicate data ratio is not greater than 0%
✓	Data Duplicates - Train Dataset	Duplicate data ratio is not greater than 0%
✓	String Mismatch - Test Dataset	No string variants
✓	String Mismatch - Train Dataset	No string variants
✓	Mixed Data Types - Test Dataset	Rare data types in column are either more than 10% or less than 1% of the data
✓	Mixed Data Types - Train Dataset	Rare data types in column are either more than 10% or less than 1% of the data
✓	Mixed Nulls - Test Dataset	Not more than 1 different null types
✓	Mixed Nulls - Train Dataset	Not more than 1 different null types
✓	Single Value in Column - Test Dataset	Does not contain only a single value
✓	Single Value in Column - Train Dataset	Does not contain only a single value
✓	Train Test Samples Mix	Percentage of test data samples that appear in train data not greater than 10%
✓	Single Feature Contribution Train-Test	Train-Test features' Predictive Power Score difference is not greater than 0.2
✓	Datasets Size Comparison	Test-Train size ratio is not smaller than 0.01
✓	String Mismatch Comparison	No new variants allowed in test data
✓	New Label Train Test	Number of new label values is not greater than 0
✓	Category Mismatch Train Test	Ratio of samples with a new category is not greater than 0%
✓	Dominant Frequency Change	Change in ratio of dominant value in data is not greater than 25%
✓	Whole Dataset Drift	Drift value is not greater than 0.25
✓	Train Test Label Drift	PSI <= 0.2 and Earth Mover's Distance <= 0.1 for label drift
✓	Conflicting Labels - Train Dataset	Ambiguous sample ratio is not greater than 0%
✓	Conflicting Labels - Test Dataset	Ambiguous sample ratio is not greater than 0%

Check With Conditions Output

Train Test Drift

Calculate drift between train dataset and test dataset per feature, using statistical measures.

Conditions Summary

Status	Condition	More Info
✓	PSI <= 0.2 and Earth Mover's Distance <= 0.1

Additional Outputs

The Drift score is a measure for the difference between two distributions, in this check - the test and train distributions.
The check shows the drift score and distributions for the features, sorted by drift score and showing only the top 5 features, according to drift score.
If available, the plot titles also show the feature importance (FI) rank.

Go to top

Train Test Label Drift

Calculate label drift between train dataset and test dataset, using statistical measures.

Conditions Summary

Status	Condition	More Info
✓	PSI <= 0.2 and Earth Mover's Distance <= 0.1 for label drift

Additional Outputs

The Drift score is a measure for the difference between two distributions, in this check - the test and train distributions.
The check shows the drift score and distributions for the label.

Go to top

Datasets Size Comparison

Verify test dataset size comparing it to the train dataset size.

Conditions Summary

Status	Condition	More Info
✓	Test-Train size ratio is not smaller than 0.01

Additional Outputs

	Train	Test
Size	112	38

Go to top

Single Feature Contribution Train-Test

Return the Predictive Power Score of all features, in order to estimate each feature's ability to predict the label.

Conditions Summary

Status	Condition	More Info
✖	Train features' Predictive Power Score is not greater than 0.7	Features in train dataset with PPS above threshold: {'petal width (cm)': '0.91', 'petal length (cm)': '0.86'}
✓	Train-Test features' Predictive Power Score difference is not greater than 0.2

Additional Outputs

The Predictive Power Score (PPS) is used to estimate the ability of a feature to predict the label by itself. (Read more about Predictive Power Score)

In the graph above, we should suspect we have problems in our data if:

1. Train dataset PPS values are high:

Can indicate that this feature's success in predicting the label is actually due to data leakage,

meaning that the feature holds information that is based on the label to begin with.

2. Large difference between train and test PPS (train PPS is larger):

An even more powerful indication of data leakage, as a feature that was powerful in train but not in test

can be explained by leakage in train that is not relevant to a new dataset.

3. Large difference between test and train PPS (test PPS is larger):

An anomalous value, could indicate drift in test dataset that caused a coincidental correlation to the target label.

Go to top

Train Test Samples Mix

Detect samples in the test data that appear also in training data.

Conditions Summary

Status	Condition	More Info
✓	Percentage of test data samples that appear in train data not greater than 10%

Additional Outputs

2.63% (1 / 38) of test data samples appear in train data

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)	target
Train indices: 30 Test indices: 28	5.80	2.70	5.10	1.90	2

Go to top

Check Without Conditions Output

Columns Info - Train Dataset

Return the role and logical type of each column.

Additional Outputs

* showing only the top 10 columns, you can change it using n_top_columns param

	target	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
role	label	numerical feature	numerical feature	numerical feature	numerical feature

Go to top

Columns Info - Test Dataset

Return the role and logical type of each column.

Additional Outputs

* showing only the top 10 columns, you can change it using n_top_columns param

	target	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
role	label	numerical feature	numerical feature	numerical feature	numerical feature

Go to top

Outlier Sample Detection - Train Dataset

Detects outliers in a dataset using the LoOP algorithm.

Additional Outputs

The Outlier Probability Score is calculated by the LoOP algorithm which measures the local deviation of density of a given sample with respect to its neighbors. These outlier scores are directly interpretable as a probability of an object being an outlier (see link for more information).

	Outlier Probability Score	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)	target
56	0.89	4.50	2.30	1.30	0.30	0
4	0.72	4.90	2.50	4.50	1.70	2
82	0.57	6.30	3.30	4.70	1.60	1
13	0.56	5.80	2.80	5.10	2.40	2
93	0.56	4.60	3.60	1.00	0.20	0

Go to top

Other Checks That Weren't Displayed

Check	Reason
Model Info	DeepchecksNotSupportedError: Check is irrelevant if model is not supplied
Index Train Test Leakage	There is no index defined to use. Did you pass a DataFrame instead of a Dataset?
Identifier Leakage - Test Dataset	Check is irrelevant for Datasets without index or date column
Identifier Leakage - Train Dataset	Check is irrelevant for Datasets without index or date column
Date Train Test Leakage Overlap	There is no datetime defined to use. Did you pass a DataFrame instead of a Dataset?
Date Train Test Leakage Duplicates	There is no datetime defined to use. Did you pass a DataFrame instead of a Dataset?
Model Inference Time - Test Dataset	DeepchecksNotSupportedError: Check is irrelevant for Datasets without model
Model Inference Time - Train Dataset	DeepchecksNotSupportedError: Check is irrelevant for Datasets without model
Unused Features	DeepchecksNotSupportedError: Check is irrelevant for Datasets without model
Boosting Overfit	DeepchecksNotSupportedError: Check is irrelevant for Datasets without model
Regression Error Distribution - Test Dataset	Check is irrelevant for classification tasks
Regression Error Distribution - Train Dataset	Check is irrelevant for classification tasks
Outlier Sample Detection - Test Dataset	NotEnoughSamplesError: There are not enough samples to run this check, found only 38 samples.
Regression Systematic Error - Train Dataset	Check is irrelevant for classification tasks
Calibration Score - Test Dataset	DeepchecksNotSupportedError: Check is irrelevant for Datasets without model
Calibration Score - Train Dataset	DeepchecksNotSupportedError: Check is irrelevant for Datasets without model
Model Error Analysis	DeepchecksNotSupportedError: Check is irrelevant for Datasets without model
Simple Model Comparison	DeepchecksNotSupportedError: Check is irrelevant for Datasets without model
Regression Systematic Error - Test Dataset	Check is irrelevant for classification tasks
Confusion Matrix Report - Train Dataset	DeepchecksNotSupportedError: Check is irrelevant for Datasets without model
Confusion Matrix Report - Test Dataset	DeepchecksNotSupportedError: Check is irrelevant for Datasets without model
Roc Report - Test Dataset	DeepchecksNotSupportedError: Check is irrelevant for Datasets without model
Roc Report - Train Dataset	DeepchecksNotSupportedError: Check is irrelevant for Datasets without model
Performance Report	DeepchecksNotSupportedError: Check is irrelevant for Datasets without model
Data Duplicates - Train Dataset	Nothing found
Data Duplicates - Test Dataset	Nothing found
String Length Out Of Bounds - Train Dataset	Nothing found
String Length Out Of Bounds - Test Dataset	Nothing found
Special Characters - Train Dataset	Nothing found
Special Characters - Test Dataset	Nothing found
Conflicting Labels - Train Dataset	Nothing found
String Mismatch - Test Dataset	Nothing found
String Mismatch - Train Dataset	Nothing found
Single Value in Column - Train Dataset	Nothing found
Mixed Data Types - Train Dataset	Nothing found
Mixed Nulls - Test Dataset	Nothing found
Mixed Nulls - Train Dataset	Nothing found
Single Value in Column - Test Dataset	Nothing found
Conflicting Labels - Test Dataset	Nothing found
New Label Train Test	Nothing found
Category Mismatch Train Test	Nothing found
Dominant Frequency Change	Nothing found
Whole Dataset Drift	Nothing found
Mixed Data Types - Test Dataset	Nothing found
String Mismatch Comparison	Nothing found

Go to top

Total running time of the script: ( 0 minutes 4.412 seconds)

Gallery generated by Sphinx-Gallery

Exporting Results

Export Outputs to Weights & Biases (wandb)