Robustness Report#

This notebooks provides an overview for using and understanding robustness report check.

Structure:

How Does the RobustnessReport Check Work?#

This check performs augmentations on images in the dataset, and measures the change in model performance for each augmentation. This is done in order to estimate how well the model generalizes on the data.

What Are Image Augmentations?#

Augmentations on images are any transformation done on the image, such as changing brightness and scale. The are used during model training for 2 reasons:

  • Data in training set is limited, and there’s a need to give the model more data samples to learn on, especially ones with augmentations that don’t necessarily exist in training dataset but may be encountered in out-of-sample data.

  • As the model relearns the same images again and again in each epoch, augmentations on data are done in order to force the model to learn a more generalized version of the image, so it will not overfit on specific images.

If Performance Decreases Significantly on Augmented Images, This Could Mean That:#

  • Training dataset was not diverse enough for the model to learn its features in a generalized way.

  • Augmentations on train dataset were either not performed, or not done enough.

When Is It Ok That the Model Will Decrease Performance Due to Augmentations?#

  • If out-of-sample data is not expected to be augmented in these ways, it may not be of concern that the model’s performance decreases. However, this could still mean that the model does not generalize well enough, and therefore can decrease in performance for other types of data shift.

  • If augmentations are too extreme, the image may be changed without recognition. In this case, where the human eye or professional eye cannot perform the needed task as well, it is expected that the model will not be able to infer correctly as well.

Check requirements#

The augmentations are usually performed in the Dataset.__getitem__ method, using a transformations object. In order to run the check we need to be able to add the augmentations as the first augmentation in the transforms function. Therefore you need to:

  1. Define in VisionData the name of your transformations field. The default field name is “transforms”

  2. Use either imgaug or Albumentations libraries as the transformations mechanism.

  3. For object detection: Use a single transformation object for both the data and the labels (use “transforms” instead of “transform” + “target_transform”)

Default Augmentations#

Image Type

Augmentation name

Grayscale

RandomBrightnessContrast

Grayscale

ShiftScaleRotate

RGB

HueSaturationValue

RGB

RGBShift

Generate data and model#

from deepchecks.vision.datasets.classification.mnist import (load_dataset,
                                                             load_model)

mnist_dataloader_test = load_dataset(train=False, batch_size=1000, object_type='VisionData')
model = load_model()

Run the check#

import torch.nn as nn

from deepchecks.vision.checks.performance.robustness_report import \
    RobustnessReport

result = RobustnessReport().run(mnist_dataloader_test, model)
result

Out:

Validating Input:   0%| | 0/1 [00:00<?, ? /s]


Ingesting Batches:   0%|          | 0/10 [00:00<?, ? Batch/s]
Ingesting Batches:  20%|##        | 2/10 [00:00<00:00, 11.66 Batch/s]
Ingesting Batches:  40%|####      | 4/10 [00:00<00:00, 11.83 Batch/s]
Ingesting Batches:  60%|######    | 6/10 [00:00<00:00, 11.92 Batch/s]
Ingesting Batches:  80%|########  | 8/10 [00:00<00:00, 11.92 Batch/s]
Ingesting Batches: 100%|##########| 10/10 [00:00<00:00, 11.92 Batch/s]


Computing Check:   0%| | 0/1 [00:00<?, ? Check/s]
Computing Check: 100%|#| 1/1 [00:02<00:00,  2.83s/ Check]

Robustness Report

Compare performance of model on original dataset and augmented dataset.

Additional Outputs
Percentage shown are difference between the metric before augmentation and after.
Augmentations used (separately): Random Brightness Contrast, Shift Scale Rotate

Augmentation "Shift Scale Rotate"

Class

4

7

8

9

5

2

6

Base Image
Augmented Image

Augmentation "Random Brightness Contrast"

Class

7

8

2

3

1

6

0

9

Base Image
Augmented Image


Observe the check’s output#

As we see in the results, the check applied different augmentations on the input data, and then compared the model’s performance on the original images vs the augmeneted images. We then compare the overall metrics and also the metrics per class, and we can see the difference of the worst degraded classes.

As a result value the check returns per augmentation the overall metrics with their relative difference from the original metrics.

result.value

Out:

{'Random Brightness Contrast': {'Precision': {'score': 0.9834997046433983, 'diff': -0.0006017693854453775}, 'Recall': {'score': 0.9834116820130395, 'diff': -0.0005606444929025739}}, 'Shift Scale Rotate': {'Precision': {'score': 0.7884346861861495, 'diff': -0.19882006409419148}, 'Recall': {'score': 0.7875944240898525, 'diff': -0.1995693380395409}}}

Define a condition#

We can define a condition that enforce our model’s performance is not degrading by more than given percentage when the data is augmeneted

check = RobustnessReport().add_condition_degradation_not_greater_than(0.05)
result = check.run(mnist_dataloader_test, model)
result.show(show_additional_outputs=False)

Out:

Validating Input:   0%| | 0/1 [00:00<?, ? /s]


Ingesting Batches:   0%|          | 0/10 [00:00<?, ? Batch/s]
Ingesting Batches:  20%|##        | 2/10 [00:00<00:00, 12.00 Batch/s]
Ingesting Batches:  40%|####      | 4/10 [00:00<00:00, 11.97 Batch/s]
Ingesting Batches:  60%|######    | 6/10 [00:00<00:00, 12.00 Batch/s]
Ingesting Batches:  80%|########  | 8/10 [00:00<00:00, 12.02 Batch/s]
Ingesting Batches: 100%|##########| 10/10 [00:00<00:00, 12.00 Batch/s]


Computing Check:   0%| | 0/1 [00:00<?, ? Check/s]
Computing Check: 100%|#| 1/1 [00:02<00:00,  2.85s/ Check]
Robustness Report


Total running time of the script: ( 0 minutes 7.653 seconds)

Gallery generated by Sphinx-Gallery