Image Property Outliers#

This notebooks provides an overview for using and understanding the image property outliers check, used to detect outliers in simple image properties in a dataset.

Structure:

Why Check for Outliers?#

Examining outliers may help you gain insights that you couldn’t have reached from taking an aggregate look or by inspecting random samples. For example, it may help you understand you have some corrupt samples (e.g. an image that is completely black), or samples you didn’t expect to have (e.g. extreme aspect ratio). In some cases, these outliers may help debug some performance discrepancies (the model can be excused for failing on a totally dark image). In more extreme cases, the outlier samples may indicate the presence of samples interfering with the model’s training by teaching the model to fit “irrelevant” samples.

How Does the Check Work?#

Ideally we would like to directly find images which are outliers, but this is computationally expensive and does not have a clear and explainable results. Therefore, we use image properties in order to find outliers (such as brightness, aspect ratio etc.) which are much more efficient to compute, and each outlier is easily explained.

We use Interquartile Range to define our upper and lower limit for the properties’ values.

Which Image Properties Are Used?#

By default the checks use the built-in image properties, and it’s also possible to replace the default properties with custom ones. For the list of the built-in image properties and explanation about custom properties refer to vision properties.

Run the Check#

For the example we will load COCO object detection data, and will run the check with the default properties.

from deepchecks.vision.checks import ImagePropertyOutliers
from deepchecks.vision.datasets.detection.coco import load_dataset

train_data = load_dataset(train=True, object_type='VisionData')
check = ImagePropertyOutliers()
result = check.run(train_data)
result

Out:

  0%|          | 0/6984509 [00:00<?, ?it/s]
 78%|#######8  | 5455872/6984509 [00:00<00:00, 54556430.72it/s]
6984704it [00:00, 64466661.81it/s]

Validating Input:   0%| | 0/1 [00:00<?, ? /s]


Ingesting Batches:   0%|  | 0/2 [00:00<?, ? Batch/s]
Ingesting Batches:  50%|# | 1/2 [00:01<00:01,  1.53s/ Batch]
Ingesting Batches: 100%|##| 2/2 [00:02<00:00,  1.48s/ Batch]


Computing Check:   0%| | 0/1 [00:00<?, ? Check/s]
Computing Check: 100%|#| 1/1 [00:00<00:00,  1.31 Check/s]

Image Property Outliers

Find outliers images with respect to the given properties.

Additional Outputs

Property "Aspect Ratio"

Total number of outliers: 11
Non-outliers range: 0.34 to 1.3
Aspect Ratio
1.33
1.5
1.5
1.5
1.54
Image

Property "Area"

Total number of outliers: 13
Non-outliers range: 220,800 to 359,040
Area
139,520
166,000
166,500
187,000
187,500
360,000
361,600
366,720
374,544
378,240
Image