.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "checks_gallery/vision/distribution/plot_label_property_outliers.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_checks_gallery_vision_distribution_plot_label_property_outliers.py: Label Property Outliers ======================= This notebooks provides an overview for using and understanding the label property outliers check, used to detect outliers in simple label properties in a dataset. **Structure:** * `Why Check for Label Outliers? <#why-check-for-label-outliers>`__ * `How Does the Check Work? <#how-does-the-check-work>`__ * `Which Label Properties Are Used? <#which-label-properties-are-used>`__ * `Run the Check <#run-the-check>`__ Why Check for Label Outliers? ----------------------- Examining outliers may help you gain insights that you couldn't have reached from taking an aggregate look or by inspecting random samples. For example, it may help you understand you have some corrupt samples (e.g. a bounding box with area 0), or samples you didn't expect to have (e.g. extreme aspect ratio). In some cases, these outliers may help debug some performance discrepancies (the model can be excused for failing on a zero size bounding box). In more extreme cases, the outlier samples may indicate the presence of samples interfering with the model's training by teaching the model to fit "irrelevant" samples. How Does the Check Work? ------------------------ In order to find outlier labels we use label properties (such as number of bounding boxes, bounding box area, etc.) We use `Interquartile Range `_ to define our upper and lower limit for the properties' values. Which Label Properties Are Used? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ For object detection we have default built-in label properties. For other tasks you have to define your own custom label properties. For the list of the built-in object detection label properties and explanation about custom properties refer to :doc:`vision properties `. .. GENERATED FROM PYTHON SOURCE LINES 45-48 Run the Check ------------- For the example we will load COCO object detection data, and will run the check with the default properties. .. GENERATED FROM PYTHON SOURCE LINES 48-57 .. code-block:: default from deepchecks.vision.checks import LabelPropertyOutliers from deepchecks.vision.datasets.detection.coco import load_dataset train_data = load_dataset(train=True, object_type='VisionData') check = LabelPropertyOutliers() result = check.run(train_data) result .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Validating Input: 0%| | 0/1 [00:00

Label Property Outliers

Find outliers labels with respect to the given properties.

Additional Outputs

Property "Bounding Box Area (in pixels)"

Total number of outliers: 80
Non-outliers range: 18.33 to 39,339.22
Bounding Box Area (in pixels)
266,496.53
276,358.32
278,175.12
280,358.25
326,770.74
Image

Property "Number of Bounding Boxes Per Image"

Total number of outliers: 6
Non-outliers range: 0 to 20.12
Number of Bounding Boxes Per Image
24
33
38
40
42
Image


.. GENERATED FROM PYTHON SOURCE LINES 58-73 Observe Graphic Result ^^^^^^^^^^^^^^^^^^^^^^ The check displays a section for each property. In each section we show the number of outliers and the non-outlier property range, and also the images with the lowest and highest values for the property. In addition, if the property returns a value per bounding box, we then show only the relevant bounding box which resulted in the outlier result. For example in property "Bounding Box Area (in pixels)" we can see that 80 outliers were found. Now we can inspect the samples and decide if we wish to ignore these kinds of samples or if we would like the model to be able to support them, in which case we may take a close look into the model's predictions on these samples. Observe Result Value ^^^^^^^^^^^^^^^^^^^^ The check returns CheckResult object with a property 'value' on it which contain the information that was calculated in the check's run. .. GENERATED FROM PYTHON SOURCE LINES 73-77 .. code-block:: default result.value .. rst-class:: sphx-glr-script-out Out: .. code-block:: none {'Bounding Box Area (in pixels)': {'indices': [27, 52, 26, 49, 25, 51, 16, 40, 12, 14, 18, 27, 49, 38, 38, 54, 53, 9, 53, 63, 1, 55, 21, 22, 49, 49, 21, 40, 0, 11, 22, 23, 0, 57, 56, 56, 46, 53, 2, 5, 16, 34, 17, 27, 22, 29, 56, 27, 18, 32, 55, 40, 47, 29, 10, 59, 10, 4, 29, 0, 34, 3, 31, 0, 23, 46, 34, 15, 4, 45, 21, 56, 45, 39, 24, 41, 36, 57, 37, 11], 'lower_limit': 18.331313767712555, 'upper_limit': 39339.21905530155}, 'Number of Bounding Boxes Per Image': {'indices': [30, 21, 43, 52, 33, 37], 'lower_limit': 0.0, 'upper_limit': 20.125}} .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.942 seconds) .. _sphx_glr_download_checks_gallery_vision_distribution_plot_label_property_outliers.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_label_property_outliers.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_label_property_outliers.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_