Note
Click here to download the full example code
Image Property Outliers#
This notebooks provides an overview for using and understanding the image property outliers check, used to detect outliers in simple image properties in a dataset.
Structure:
Why Check for Outliers?#
Examining outliers may help you gain insights that you couldn’t have reached from taking an aggregate look or by inspecting random samples. For example, it may help you understand you have some corrupt samples (e.g. an image that is completely black), or samples you didn’t expect to have (e.g. extreme aspect ratio). In some cases, these outliers may help debug some performance discrepancies (the model can be excused for failing on a totally dark image). In more extreme cases, the outlier samples may indicate the presence of samples interfering with the model’s training by teaching the model to fit “irrelevant” samples.
How Does the Check Work?#
Ideally we would like to directly find images which are outliers, but this is computationally expensive and does not have a clear and explainable results. Therefore, we use image properties in order to find outliers (such as brightness, aspect ratio etc.) which are much more efficient to compute, and each outlier is easily explained.
We use Interquartile Range to define our upper and lower limit for the properties’ values.
Which Image Properties Are Used?#
By default the checks use the built-in image properties, and it’s also possible to replace the default properties with custom ones. For the list of the built-in image properties and explanation about custom properties refer to vision properties.
Run the Check#
For the example we will load COCO object detection data, and will run the check with the default properties.
from deepchecks.vision.checks import ImagePropertyOutliers
from deepchecks.vision.datasets.detection.coco import load_dataset
train_data = load_dataset(train=True, object_type='VisionData')
check = ImagePropertyOutliers()
result = check.run(train_data)
result
Out:
0%| | 0/6984509 [00:00<?, ?it/s]
78%|#######8 | 5455872/6984509 [00:00<00:00, 54556430.72it/s]
6984704it [00:00, 64466661.81it/s]
Validating Input: 0%| | 0/1 [00:00<?, ? /s]
Ingesting Batches: 0%| | 0/2 [00:00<?, ? Batch/s]
Ingesting Batches: 50%|# | 1/2 [00:01<00:01, 1.53s/ Batch]
Ingesting Batches: 100%|##| 2/2 [00:02<00:00, 1.48s/ Batch]
Computing Check: 0%| | 0/1 [00:00<?, ? Check/s]
Computing Check: 100%|#| 1/1 [00:00<00:00, 1.31 Check/s]