Model Error Analysis check#

This notebooks provides an overview for using and understanding the model error analysis check.

Structure:

What is the purpose of the check?#

Imports#

from deepchecks.vision.checks.performance import ModelErrorAnalysis

Classification Performance Report#

Generate data and model:#

from deepchecks.vision.datasets.classification import mnist

mnist_model = mnist.load_model()
train_ds = mnist.load_dataset(train=True, object_type='VisionData')
test_ds = mnist.load_dataset(train=False, object_type='VisionData')

Run the check:#

check = ModelErrorAnalysis(min_error_model_score=-0.1)
check.run(train_ds, test_ds, mnist_model)

Out:

Validating Input:   0%| | 0/1 [00:00<?, ? /s]
Validating Input: 100%|#| 1/1 [00:00<00:00,  8.64 /s]


Ingesting Batches - Train Dataset:   0%|                                                                                                                                                             | 0/157 [00:00<?, ? Batch/s]
Ingesting Batches - Train Dataset:   7%|###########                                                                                                                                                  | 11/157 [00:00<00:01, 104.99 Batch/s]
Ingesting Batches - Train Dataset:  14%|######################                                                                                                                                       | 22/157 [00:00<00:01, 104.45 Batch/s]
Ingesting Batches - Train Dataset:  21%|#################################                                                                                                                            | 33/157 [00:00<00:01, 105.46 Batch/s]
Ingesting Batches - Train Dataset:  28%|############################################                                                                                                                 | 44/157 [00:00<00:01, 106.06 Batch/s]
Ingesting Batches - Train Dataset:  35%|#######################################################                                                                                                      | 55/157 [00:00<00:00, 105.49 Batch/s]
Ingesting Batches - Train Dataset:  42%|##################################################################                                                                                           | 66/157 [00:00<00:00, 104.93 Batch/s]
Ingesting Batches - Train Dataset:  49%|#############################################################################                                                                                | 77/157 [00:00<00:00, 105.54 Batch/s]
Ingesting Batches - Train Dataset:  56%|########################################################################################                                                                     | 88/157 [00:00<00:00, 106.10 Batch/s]
Ingesting Batches - Train Dataset:  63%|###################################################################################################                                                          | 99/157 [00:00<00:00, 106.27 Batch/s]
Ingesting Batches - Train Dataset:  70%|##############################################################################################################                                               | 110/157 [00:01<00:00, 105.88 Batch/s]
Ingesting Batches - Train Dataset:  77%|#########################################################################################################################                                    | 121/157 [00:01<00:00, 105.05 Batch/s]
Ingesting Batches - Train Dataset:  84%|####################################################################################################################################                         | 132/157 [00:01<00:00, 105.03 Batch/s]
Ingesting Batches - Train Dataset:  91%|###############################################################################################################################################              | 143/157 [00:01<00:00, 105.26 Batch/s]
Ingesting Batches - Train Dataset:  98%|##########################################################################################################################################################   | 154/157 [00:01<00:00, 105.94 Batch/s]


Ingesting Batches - Test Dataset:   0%|                                                                                                                                                             | 0/157 [00:00<?, ? Batch/s]
Ingesting Batches - Test Dataset:   1%|#                                                                                                                                                            | 1/157 [00:00<00:19,  7.83 Batch/s]
Ingesting Batches - Test Dataset:   1%|##                                                                                                                                                           | 2/157 [00:00<00:19,  7.78 Batch/s]
Ingesting Batches - Test Dataset:   2%|###                                                                                                                                                          | 3/157 [00:00<00:19,  7.78 Batch/s]
Ingesting Batches - Test Dataset:   3%|####                                                                                                                                                         | 4/157 [00:00<00:19,  7.82 Batch/s]
Ingesting Batches - Test Dataset:   3%|#####                                                                                                                                                        | 5/157 [00:00<00:19,  7.90 Batch/s]
Ingesting Batches - Test Dataset:   4%|######                                                                                                                                                       | 6/157 [00:00<00:19,  7.92 Batch/s]
Ingesting Batches - Test Dataset:   4%|######9                                                                                                                                                      | 7/157 [00:00<00:18,  7.96 Batch/s]
Ingesting Batches - Test Dataset:   5%|########                                                                                                                                                     | 8/157 [00:01<00:18,  8.01 Batch/s]
Ingesting Batches - Test Dataset:   6%|#########                                                                                                                                                    | 9/157 [00:01<00:18,  8.02 Batch/s]
Ingesting Batches - Test Dataset:   6%|##########                                                                                                                                                   | 10/157 [00:01<00:18,  8.05 Batch/s]


Computing Check:   0%| | 0/1 [00:00<?, ? Check/s]/home/runner/work/deepchecks/deepchecks/deepchecks/utils/features.py:180: UserWarning:

Cannot use model's built-in feature importance on a Scikit-learn Pipeline, using permutation feature importance calculation instead

/home/runner/work/deepchecks/deepchecks/deepchecks/utils/features.py:290: UserWarning:

Calculating permutation feature importance without time limit. Expected to finish in 4 seconds


Computing Check: 100%|#| 1/1 [00:02<00:00,  2.93s/ Check]

Model Error Analysis

Find the properties that best split the data into segments of high and low model error.

Additional Outputs
The following graphs show the distribution of error for top properties that are most useful for distinguishing high error samples from low error samples.

Note - data sampling: Running on 10000 train data samples out of 60000. Sample size can be controlled with the "n_samples" parameter.



Object Detection Class Performance#

For object detection tasks - the default metric that is being calculated it the Average Precision. The definition of the Average Precision is identical to how the COCO dataset defined it - mean of the average precision per class, over the range [0.5, 0.95, 0.05] of IoU thresholds.

import numpy as np

from deepchecks.vision.datasets.detection import coco

Generate Data and Model#

We generate a sample dataset of 128 images from the COCO dataset, and using the YOLOv5 model

yolo = coco.load_model(pretrained=True)

train_ds = coco.load_dataset(train=True, object_type='VisionData')
test_ds = coco.load_dataset(train=False, object_type='VisionData')

Run the check:#

check = ModelErrorAnalysis(min_error_model_score=-1)
check.run(train_ds, test_ds, yolo)

Out:

Validating Input:   0%| | 0/1 [00:00<?, ? /s]
Validating Input: 100%|#| 1/1 [00:11<00:00, 11.12s/ ]


Ingesting Batches - Train Dataset:   0%|  | 0/2 [00:00<?, ? Batch/s]
Ingesting Batches - Train Dataset:  50%|# | 1/2 [00:06<00:06,  6.79s/ Batch]
Ingesting Batches - Train Dataset: 100%|##| 2/2 [00:13<00:00,  6.74s/ Batch]


Ingesting Batches - Test Dataset:   0%|  | 0/2 [00:00<?, ? Batch/s]
Ingesting Batches - Test Dataset:  50%|# | 1/2 [00:06<00:06,  6.57s/ Batch]
Ingesting Batches - Test Dataset: 100%|##| 2/2 [00:13<00:00,  6.69s/ Batch]


Computing Check:   0%| | 0/1 [00:00<?, ? Check/s]/home/runner/work/deepchecks/deepchecks/deepchecks/utils/features.py:180: UserWarning:

Cannot use model's built-in feature importance on a Scikit-learn Pipeline, using permutation feature importance calculation instead

/home/runner/work/deepchecks/deepchecks/deepchecks/utils/features.py:290: UserWarning:

Calculating permutation feature importance without time limit. Expected to finish in 4 seconds


Computing Check: 100%|#| 1/1 [00:02<00:00,  2.39s/ Check]

Model Error Analysis

Find the properties that best split the data into segments of high and low model error.

Additional Outputs
The following graphs show the distribution of error for top properties that are most useful for distinguishing high error samples from low error samples.


Total running time of the script: ( 0 minutes 46.869 seconds)

Gallery generated by Sphinx-Gallery