Export Suite Output to a HTML Report#

In this guide, we will demonstrate how to export a suite’s output as an HTML report. This enables easily sharing the results easier and also using deepchecks outside of the notebook environment.

Structure:

Load Data#

Let’s fetch the iris train and test datasets

from deepchecks.tabular.datasets.classification import iris

train_dataset, test_dataset = iris.load_data()

Run Suite#

from deepchecks.tabular.suites import full_suite

suite = full_suite()
suite_result = suite.run(train_dataset=train_dataset, test_dataset=test_dataset)

Out:

Full Suite:   0%|                                    | 0/36 [00:00<?, ? Check/s]
Full Suite:   0%|                                    | 0/36 [00:00<?, ? Check/s, Check=Model Info]
Full Suite:   3%|#                                   | 1/36 [00:00<00:00, 5675.65 Check/s, Check=Columns Info]
Full Suite:   6%|##                                  | 2/36 [00:00<00:00, 1435.42 Check/s, Check=Confusion Matrix Report]
Full Suite:   8%|###                                 | 3/36 [00:00<00:00, 1172.69 Check/s, Check=Performance Report]
Full Suite:  11%|####                                | 4/36 [00:00<00:00, 1460.54 Check/s, Check=Roc Report]
Full Suite:  14%|#####                               | 5/36 [00:00<00:00, 1342.95 Check/s, Check=Simple Model Comparison]
Full Suite:  17%|######                              | 6/36 [00:00<00:00, 1524.46 Check/s, Check=Model Error Analysis]
Full Suite:  19%|#######                             | 7/36 [00:00<00:00, 1719.08 Check/s, Check=Calibration Score]
Full Suite:  22%|########                            | 8/36 [00:00<00:00, 1578.44 Check/s, Check=Regression Systematic Error]
Full Suite:  25%|#########                           | 9/36 [00:00<00:00, 1710.87 Check/s, Check=Regression Error Distribution]
Full Suite:  28%|##########                          | 10/36 [00:00<00:00, 1852.20 Check/s, Check=Boosting Overfit]
Full Suite:  31%|###########                         | 11/36 [00:00<00:00, 1993.49 Check/s, Check=Unused Features]
Full Suite:  33%|############                        | 12/36 [00:00<00:00, 2128.64 Check/s, Check=Model Inference Time]
Full Suite:  36%|#############                       | 13/36 [00:00<00:00, 2257.06 Check/s, Check=Train Test Feature Drift]
Full Suite:  39%|##############                      | 14/36 [00:00<00:00, 97.01 Check/s, Check=Train Test Feature Drift]
Full Suite:  39%|##############                      | 14/36 [00:00<00:00, 97.01 Check/s, Check=Train Test Label Drift]
Full Suite:  42%|###############                     | 15/36 [00:00<00:00, 97.01 Check/s, Check=Whole Dataset Drift]   Calculating permutation feature importance. Expected to finish in 1 seconds

Full Suite:  44%|################                    | 16/36 [00:00<00:00, 97.01 Check/s, Check=Dominant Frequency Change]
Full Suite:  47%|#################                   | 17/36 [00:00<00:00, 97.01 Check/s, Check=Category Mismatch Train Test]
Full Suite:  50%|##################                  | 18/36 [00:00<00:00, 97.01 Check/s, Check=New Label Train Test]
Full Suite:  53%|###################                 | 19/36 [00:00<00:00, 97.01 Check/s, Check=String Mismatch Comparison]
Full Suite:  56%|####################                | 20/36 [00:00<00:00, 97.01 Check/s, Check=Datasets Size Comparison]
Full Suite:  58%|#####################               | 21/36 [00:00<00:00, 97.01 Check/s, Check=Date Train Test Leakage Duplicates]
Full Suite:  61%|######################              | 22/36 [00:00<00:00, 97.01 Check/s, Check=Date Train Test Leakage Overlap]
Full Suite:  64%|#######################             | 23/36 [00:00<00:00, 97.01 Check/s, Check=Single Feature Contribution Train Test]
Full Suite:  67%|########################            | 24/36 [00:00<00:00, 65.91 Check/s, Check=Single Feature Contribution Train Test]
Full Suite:  67%|########################            | 24/36 [00:00<00:00, 65.91 Check/s, Check=Train Test Samples Mix]
Full Suite:  69%|#########################           | 25/36 [00:00<00:00, 65.91 Check/s, Check=Identifier Leakage]
Full Suite:  72%|##########################          | 26/36 [00:00<00:00, 65.91 Check/s, Check=Index Train Test Leakage]
Full Suite:  75%|###########################         | 27/36 [00:00<00:00, 65.91 Check/s, Check=Is Single Value]
Full Suite:  78%|############################        | 28/36 [00:00<00:00, 65.91 Check/s, Check=Mixed Nulls]
Full Suite:  81%|#############################       | 29/36 [00:00<00:00, 65.91 Check/s, Check=Mixed Data Types]
Full Suite:  83%|##############################      | 30/36 [00:00<00:00, 65.91 Check/s, Check=String Mismatch]
Full Suite:  86%|###############################     | 31/36 [00:00<00:00, 65.91 Check/s, Check=Data Duplicates]
Full Suite:  89%|################################    | 32/36 [00:00<00:00, 65.91 Check/s, Check=String Length Out Of Bounds]
Full Suite:  92%|#################################   | 33/36 [00:00<00:00, 65.91 Check/s, Check=Special Characters]
Full Suite:  94%|##################################  | 34/36 [00:00<00:00, 65.91 Check/s, Check=Conflicting Labels]
Full Suite:  97%|################################### | 35/36 [00:00<00:00, 65.91 Check/s, Check=Outlier Sample Detection]

Save Suite Result to an HTML Report#

Exporting the suite’s output to an HTML file is possible using the save_as_html function. This function expects a file-like object, whether it’s a file name or the full path to the destination folder.

suite_result.save_as_html('my_suite.html')

# or
suite_result.save_as_html() # will save the result in output.html

Out:

'output.html'
# Removing outputs created. this cell should be hidden in nbpshinx using "nbsphinx: hidden" in the metadata
import os

os.remove('output.html')
os.remove('my_suite.html')

Working with in-memory buffers

The suite output can also be written into a file buffers. This can be done by setting the file argument with a StringIO or BytesIO buffer object.

import io

html_out = io.StringIO()
suite_result.save_as_html(file=html_out)

View Suite Output#

The suite’s output can still be viewed within the notebook

suite_result

Full Suite

The suite is composed of various checks such as: Model Info, Date Train Test Leakage Overlap, Columns Info, etc...
Each check may contain conditions (which will result in pass / fail / warning / error , represented by / / ! / ) as well as other outputs such as plots or tables.
Suites, checks and conditions can all be modified. Read more about custom suites.


Conditions Summary

Status Check Condition More Info
Single Feature Contribution Train-Test Train features' Predictive Power Score is not greater than 0.7 Features in train dataset with PPS above threshold: {'petal width (cm)': '0.91', 'petal length (cm)': '0.86'}
Train Test Drift PSI <= 0.2 and Earth Mover's Distance <= 0.1
Special Characters - Test Dataset Ratio of entirely special character samples not greater than 0.1%
Special Characters - Train Dataset Ratio of entirely special character samples not greater than 0.1%
String Length Out Of Bounds - Test Dataset Ratio of outliers not greater than 0% string length outliers
String Length Out Of Bounds - Train Dataset Ratio of outliers not greater than 0% string length outliers
Data Duplicates - Test Dataset Duplicate data ratio is not greater than 0%
Data Duplicates - Train Dataset Duplicate data ratio is not greater than 0%
String Mismatch - Test Dataset No string variants
String Mismatch - Train Dataset No string variants
Mixed Data Types - Test Dataset Rare data types in column are either more than 10% or less than 1% of the data
Mixed Data Types - Train Dataset Rare data types in column are either more than 10% or less than 1% of the data
Mixed Nulls - Test Dataset Not more than 1 different null types
Mixed Nulls - Train Dataset Not more than 1 different null types
Single Value in Column - Test Dataset Does not contain only a single value
Single Value in Column - Train Dataset Does not contain only a single value
Train Test Samples Mix Percentage of test data samples that appear in train data not greater than 10%
Single Feature Contribution Train-Test Train-Test features' Predictive Power Score difference is not greater than 0.2
Datasets Size Comparison Test-Train size ratio is not smaller than 0.01
String Mismatch Comparison No new variants allowed in test data
New Label Train Test Number of new label values is not greater than 0
Category Mismatch Train Test Ratio of samples with a new category is not greater than 0%
Dominant Frequency Change Change in ratio of dominant value in data is not greater than 25%
Whole Dataset Drift Drift value is not greater than 0.25
Train Test Label Drift PSI <= 0.2 and Earth Mover's Distance <= 0.1 for label drift
Conflicting Labels - Train Dataset Ambiguous sample ratio is not greater than 0%
Conflicting Labels - Test Dataset Ambiguous sample ratio is not greater than 0%

Check With Conditions Output

Train Test Drift

Calculate drift between train dataset and test dataset per feature, using statistical measures.

Conditions Summary
Status Condition More Info
PSI <= 0.2 and Earth Mover's Distance <= 0.1
Additional Outputs
The Drift score is a measure for the difference between two distributions, in this check - the test and train distributions.
The check shows the drift score and distributions for the features, sorted by drift score and showing only the top 5 features, according to drift score.
If available, the plot titles also show the feature importance (FI) rank.

Go to top

Train Test Label Drift

Calculate label drift between train dataset and test dataset, using statistical measures.

Conditions Summary
Status Condition More Info
PSI <= 0.2 and Earth Mover's Distance <= 0.1 for label drift
Additional Outputs
The Drift score is a measure for the difference between two distributions, in this check - the test and train distributions.
The check shows the drift score and distributions for the label.

Go to top

Datasets Size Comparison

Verify test dataset size comparing it to the train dataset size.

Conditions Summary
Status Condition More Info
Test-Train size ratio is not smaller than 0.01
Additional Outputs
  Train Test
Size 112 38

Go to top

Single Feature Contribution Train-Test

Return the Predictive Power Score of all features, in order to estimate each feature's ability to predict the label.

Conditions Summary
Status Condition More Info
Train features' Predictive Power Score is not greater than 0.7 Features in train dataset with PPS above threshold: {'petal width (cm)': '0.91', 'petal length (cm)': '0.86'}
Train-Test features' Predictive Power Score difference is not greater than 0.2
Additional Outputs
The Predictive Power Score (PPS) is used to estimate the ability of a feature to predict the label by itself. (Read more about Predictive Power Score)
In the graph above, we should suspect we have problems in our data if:
1. Train dataset PPS values are high:
Can indicate that this feature's success in predicting the label is actually due to data leakage,
meaning that the feature holds information that is based on the label to begin with.
2. Large difference between train and test PPS (train PPS is larger):
An even more powerful indication of data leakage, as a feature that was powerful in train but not in test
can be explained by leakage in train that is not relevant to a new dataset.
3. Large difference between test and train PPS (test PPS is larger):
An anomalous value, could indicate drift in test dataset that caused a coincidental correlation to the target label.

Go to top

Train Test Samples Mix

Detect samples in the test data that appear also in training data.

Conditions Summary
Status Condition More Info
Percentage of test data samples that appear in train data not greater than 10%
Additional Outputs
2.63% (1 / 38) of test data samples appear in train data
  sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target
Train indices: 30 Test indices: 28 5.80 2.70 5.10 1.90 2

Go to top

Check Without Conditions Output

Columns Info - Train Dataset

Return the role and logical type of each column.

Additional Outputs
* showing only the top 10 columns, you can change it using n_top_columns param
  target sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
role label numerical feature numerical feature numerical feature numerical feature

Go to top

Columns Info - Test Dataset

Return the role and logical type of each column.

Additional Outputs
* showing only the top 10 columns, you can change it using n_top_columns param
  target sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
role label numerical feature numerical feature numerical feature numerical feature

Go to top

Outlier Sample Detection - Train Dataset

Detects outliers in a dataset using the LoOP algorithm.

Additional Outputs
The Outlier Probability Score is calculated by the LoOP algorithm which measures the local deviation of density of a given sample with respect to its neighbors. These outlier scores are directly interpretable as a probability of an object being an outlier (see link for more information).

  Outlier Probability Score sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target
56 0.89 4.50 2.30 1.30 0.30 0
4 0.72 4.90 2.50 4.50 1.70 2
82 0.57 6.30 3.30 4.70 1.60 1
13 0.56 5.80 2.80 5.10 2.40 2
93 0.56 4.60 3.60 1.00 0.20 0

Go to top

Other Checks That Weren't Displayed

Check Reason
Model Info DeepchecksNotSupportedError: Check is irrelevant if model is not supplied
Index Train Test Leakage There is no index defined to use. Did you pass a DataFrame instead of a Dataset?
Identifier Leakage - Test Dataset Check is irrelevant for Datasets without index or date column
Identifier Leakage - Train Dataset Check is irrelevant for Datasets without index or date column
Date Train Test Leakage Overlap There is no datetime defined to use. Did you pass a DataFrame instead of a Dataset?
Date Train Test Leakage Duplicates There is no datetime defined to use. Did you pass a DataFrame instead of a Dataset?
Model Inference Time - Test Dataset DeepchecksNotSupportedError: Check is irrelevant for Datasets without model
Model Inference Time - Train Dataset DeepchecksNotSupportedError: Check is irrelevant for Datasets without model
Unused Features DeepchecksNotSupportedError: Check is irrelevant for Datasets without model
Boosting Overfit DeepchecksNotSupportedError: Check is irrelevant for Datasets without model
Regression Error Distribution - Test Dataset Check is irrelevant for classification tasks
Regression Error Distribution - Train Dataset Check is irrelevant for classification tasks
Outlier Sample Detection - Test Dataset NotEnoughSamplesError: There are not enough samples to run this check, found only 38 samples.
Regression Systematic Error - Train Dataset Check is irrelevant for classification tasks
Calibration Score - Test Dataset DeepchecksNotSupportedError: Check is irrelevant for Datasets without model
Calibration Score - Train Dataset DeepchecksNotSupportedError: Check is irrelevant for Datasets without model
Model Error Analysis DeepchecksNotSupportedError: Check is irrelevant for Datasets without model
Simple Model Comparison DeepchecksNotSupportedError: Check is irrelevant for Datasets without model
Regression Systematic Error - Test Dataset Check is irrelevant for classification tasks
Confusion Matrix Report - Train Dataset DeepchecksNotSupportedError: Check is irrelevant for Datasets without model
Confusion Matrix Report - Test Dataset DeepchecksNotSupportedError: Check is irrelevant for Datasets without model
Roc Report - Test Dataset DeepchecksNotSupportedError: Check is irrelevant for Datasets without model
Roc Report - Train Dataset DeepchecksNotSupportedError: Check is irrelevant for Datasets without model
Performance Report DeepchecksNotSupportedError: Check is irrelevant for Datasets without model
Data Duplicates - Train Dataset Nothing found
Data Duplicates - Test Dataset Nothing found
String Length Out Of Bounds - Train Dataset Nothing found
String Length Out Of Bounds - Test Dataset Nothing found
Special Characters - Train Dataset Nothing found
Special Characters - Test Dataset Nothing found
Conflicting Labels - Train Dataset Nothing found
String Mismatch - Test Dataset Nothing found
String Mismatch - Train Dataset Nothing found
Single Value in Column - Train Dataset Nothing found
Mixed Data Types - Train Dataset Nothing found
Mixed Nulls - Test Dataset Nothing found
Mixed Nulls - Train Dataset Nothing found
Single Value in Column - Test Dataset Nothing found
Conflicting Labels - Test Dataset Nothing found
New Label Train Test Nothing found
Category Mismatch Train Test Nothing found
Dominant Frequency Change Nothing found
Whole Dataset Drift Nothing found
Mixed Data Types - Test Dataset Nothing found
String Mismatch Comparison Nothing found

Go to top


Total running time of the script: ( 0 minutes 4.412 seconds)

Gallery generated by Sphinx-Gallery