Class Performance#

This notebooks provides an overview for using and understanding the class performance check.

Structure:

What Is the Purpose of the Check?#

The class performance check evaluates several metrics on the given model and data and returns all of the results in a single check. The check uses the following default metrics:

Task Type

Property name

Classification

Precision

Classification

Recall

Object Detection

Average Precision

Object Detection

Average Recall

In addition to the default metrics, the check supports custom metrics that should be implemented using the torch.ignite.Metric API. These can be passed as a list using the alternative_metrics parameter of the check, which will override the default metrics.

Imports#

from deepchecks.vision.checks.performance import ClassPerformance
from deepchecks.vision.datasets.classification import mnist

Classification Performance Report#

Generate data and model:#

mnist_model = mnist.load_model()
train_ds = mnist.load_dataset(train=True, object_type='VisionData')
test_ds = mnist.load_dataset(train=False, object_type='VisionData')

Run the check#

Out:

Validating Input:   0%| | 0/1 [00:00<?, ? /s]
Validating Input: 100%|#| 1/1 [00:00<00:00,  8.14 /s]


Ingesting Batches - Train Dataset:   0%|                                                                                                                                                             | 0/157 [00:00<?, ? Batch/s]
Ingesting Batches - Train Dataset:  10%|################                                                                                                                                             | 16/157 [00:00<00:00, 150.82 Batch/s]
Ingesting Batches - Train Dataset:  20%|################################                                                                                                                             | 32/157 [00:00<00:00, 152.44 Batch/s]
Ingesting Batches - Train Dataset:  31%|################################################                                                                                                             | 48/157 [00:00<00:00, 149.77 Batch/s]
Ingesting Batches - Train Dataset:  41%|################################################################                                                                                             | 64/157 [00:00<00:00, 152.11 Batch/s]
Ingesting Batches - Train Dataset:  51%|################################################################################                                                                             | 80/157 [00:00<00:00, 153.46 Batch/s]
Ingesting Batches - Train Dataset:  61%|################################################################################################                                                             | 96/157 [00:00<00:00, 154.04 Batch/s]
Ingesting Batches - Train Dataset:  71%|###############################################################################################################9                                             | 112/157 [00:00<00:00, 154.67 Batch/s]
Ingesting Batches - Train Dataset:  82%|################################################################################################################################                             | 128/157 [00:00<00:00, 155.15 Batch/s]
Ingesting Batches - Train Dataset:  92%|################################################################################################################################################             | 144/157 [00:00<00:00, 155.58 Batch/s]


Ingesting Batches - Test Dataset:   0%|                                                                                                                                                             | 0/157 [00:00<?, ? Batch/s]
Ingesting Batches - Test Dataset:   1%|##                                                                                                                                                           | 2/157 [00:00<00:14, 10.76 Batch/s]
Ingesting Batches - Test Dataset:   3%|####                                                                                                                                                         | 4/157 [00:00<00:13, 10.96 Batch/s]
Ingesting Batches - Test Dataset:   4%|######                                                                                                                                                       | 6/157 [00:00<00:13, 11.07 Batch/s]
Ingesting Batches - Test Dataset:   5%|########                                                                                                                                                     | 8/157 [00:00<00:13, 11.09 Batch/s]
Ingesting Batches - Test Dataset:   6%|##########                                                                                                                                                   | 10/157 [00:00<00:13, 11.10 Batch/s]


Computing Check:   0%| | 0/1 [00:00<?, ? Check/s]

Class Performance

Summarize given metrics on a dataset and model.

Additional Outputs

Note - data sampling: Running on 10000 train data samples out of 60000. Sample size can be controlled with the "n_samples" parameter.



Object Detection Class Performance#

For object detection tasks - the default metric that is being calculated it the Average Precision. The definition of the Average Precision is identical to how the COCO dataset defined it - mean of the average precision per class, over the range [0.5, 0.95, 0.05] of IoU thresholds.

from deepchecks.vision.datasets.detection import coco

Generate Data and Model#

We generate a sample dataset of 128 images from the COCO dataset, and using the YOLOv5 model.

yolo = coco.load_model(pretrained=True)

train_ds = coco.load_dataset(train=True, object_type='VisionData')
test_ds = coco.load_dataset(train=False, object_type='VisionData')

Run the check#

check = ClassPerformance(show_only='best')
check.run(train_ds, test_ds, yolo)

Out:

Validating Input:   0%| | 0/1 [00:00<?, ? /s]
Validating Input: 100%|#| 1/1 [00:11<00:00, 11.16s/ ]


Ingesting Batches - Train Dataset:   0%|  | 0/2 [00:00<?, ? Batch/s]
Ingesting Batches - Train Dataset:  50%|# | 1/2 [00:05<00:05,  5.74s/ Batch]
Ingesting Batches - Train Dataset: 100%|##| 2/2 [00:11<00:00,  5.70s/ Batch]


Ingesting Batches - Test Dataset:   0%|  | 0/2 [00:00<?, ? Batch/s]
Ingesting Batches - Test Dataset:  50%|# | 1/2 [00:05<00:05,  5.67s/ Batch]
Ingesting Batches - Test Dataset: 100%|##| 2/2 [00:11<00:00,  5.67s/ Batch]


Computing Check:   0%| | 0/1 [00:00<?, ? Check/s]
Computing Check: 100%|#| 1/1 [00:00<00:00,  2.46 Check/s]

Class Performance

Summarize given metrics on a dataset and model.

Additional Outputs


Define a Condition#

We can also define a condition to validate that our model performance is above a certain threshold. The condition is defined as a function that takes the results of the check as input and returns a ConditionResult object.

check = ClassPerformance(show_only='worst')
check.add_condition_test_performance_not_less_than(0.2)
result = check.run(train_ds, test_ds, yolo)
result

Out:

Validating Input:   0%| | 0/1 [00:00<?, ? /s]
Validating Input: 100%|#| 1/1 [00:11<00:00, 11.13s/ ]


Ingesting Batches - Train Dataset:   0%|  | 0/2 [00:00<?, ? Batch/s]
Ingesting Batches - Train Dataset:  50%|# | 1/2 [00:05<00:05,  5.78s/ Batch]
Ingesting Batches - Train Dataset: 100%|##| 2/2 [00:11<00:00,  5.74s/ Batch]


Ingesting Batches - Test Dataset:   0%|  | 0/2 [00:00<?, ? Batch/s]
Ingesting Batches - Test Dataset:  50%|# | 1/2 [00:05<00:05,  5.72s/ Batch]
Ingesting Batches - Test Dataset: 100%|##| 2/2 [00:11<00:00,  5.72s/ Batch]


Computing Check:   0%| | 0/1 [00:00<?, ? Check/s]
Computing Check: 100%|#| 1/1 [00:00<00:00,  2.46 Check/s]

Class Performance

Summarize given metrics on a dataset and model.

Conditions Summary
Status Condition More Info
Scores are not less than 0.2 Found metrics with scores below threshold: [{'Class Name': 'sink', 'Metric': 'Average Recall', 'Value': 0.3}, {'Class Name': 'kite', 'Metric': 'Average Recall', 'Value': 0.275}, {'Class Name': 'spoon', 'Metric': 'Average Precision', 'Value': 0.2524752475247524}, {'Class Name': 'spoon', 'Metric': 'Average Recall', 'Value': 0.25}, {'Class Name': 'sports ball', 'Metric': 'Average Recall', 'Value': 0.22000000000000003}, {'Class Name': 'bottle', 'Metric': 'Average Precision', 'Value': 0.20792079207920783}, {'Class Name': 'refrigerator', 'Metric': 'Average Precision', 'Value': 0.20198019801980194}, {'Class Name': 'boat', 'Metric': 'Average Recall', 'Value': 0.2}, {'Class Name': 'refrigerator', 'Metric': 'Average Recall', 'Value': 0.2}, {'Class Name': 'bottle', 'Metric': 'Average Recall', 'Value': 0.19999999999999998}, {'Class Name': 'boat', 'Metric': 'Average Precision', 'Value': 0.18514851485148517}, {'Class Name': 'sports ball', 'Metric': 'Average Precision', 'Value': 0.15577557755775578}, {'Class Name': 'sink', 'Metric': 'Average Precision', 'Value': 0.15148514851485145}, {'Class Name': 'car', 'Metric': 'Average Recall', 'Value': 0.14705882352941177}, {'Class Name': 'book', 'Metric': 'Average Recall', 'Value': 0.12916666666666665}, {'Class Name': 'kite', 'Metric': 'Average Precision', 'Value': 0.1267326732673267}, {'Class Name': 'car', 'Metric': 'Average Precision', 'Value': 0.11146628948609147}, {'Class Name': 'bicycle', 'Metric': 'Average Precision', 'Value': 0.10396039603960391}, {'Class Name': 'cell phone', 'Metric': 'Average Precision', 'Value': 0.10198019801980196}, {'Class Name': 'bicycle', 'Metric': 'Average Recall', 'Value': 0.1}, {'Class Name': 'cell phone', 'Metric': 'Average Recall', 'Value': 0.09999999999999998}, {'Class Name': 'book', 'Metric': 'Average Precision', 'Value': 0.09405940594059405}, {'Class Name': 'handbag', 'Metric': 'Average Precision', 'Value': 0.038613861386138607}, {'Class Name': 'handbag', 'Metric': 'Average Recall', 'Value': 0.0375}, {'Class Name': 'truck', 'Metric': 'Average Precision', 'Value': 0.0}, {'Class Name': 'traffic light', 'Metric': 'Average Precision', 'Value': 0.0}, {'Class Name': 'baseball bat', 'Metric': 'Average Precision', 'Value': 0.0}, {'Class Name': 'fork', 'Metric': 'Average Precision', 'Value': 0.0}, {'Class Name': 'knife', 'Metric': 'Average Precision', 'Value': 0.0}, {'Class Name': 'laptop', 'Metric': 'Average Precision', 'Value': 0.0}, {'Class Name': 'mouse', 'Metric': 'Average Precision', 'Value': 0.0}, {'Class Name': 'oven', 'Metric': 'Average Precision', 'Value': 0.0}, {'Class Name': 'truck', 'Metric': 'Average Recall', 'Value': 0.0}, {'Class Name': 'traffic light', 'Metric': 'Average Recall', 'Value': 0.0}, {'Class Name': 'baseball bat', 'Metric': 'Average Recall', 'Value': 0.0}, {'Class Name': 'fork', 'Metric': 'Average Recall', 'Value': 0.0}, {'Class Name': 'knife', 'Metric': 'Average Recall', 'Value': 0.0}, {'Class Name': 'laptop', 'Metric': 'Average Recall', 'Value': 0.0}, {'Class Na...
Additional Outputs


We detected that for several classes our model performance is below the threshold.

Total running time of the script: ( 1 minutes 11.675 seconds)

Gallery generated by Sphinx-Gallery