.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "checks_gallery/vision/data_integrity/plot_label_property_outliers.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_checks_gallery_vision_data_integrity_plot_label_property_outliers.py: .. _plot_vision_label_property_outliers: Label Property Outliers ======================= This notebooks provides an overview for using and understanding the label property outliers check, used to detect outliers in simple label properties in a dataset. **Structure:** * `Why Check for Label Outliers? <#why-check-for-label-outliers>`__ * `How Does the Check Work? <#how-does-the-check-work>`__ * `Which Label Properties Are Used? <#which-label-properties-are-used>`__ * `Run the Check <#run-the-check>`__ Why Check for Label Outliers? ----------------------------- Examining outliers may help you gain insights that you couldn't have reached from taking an aggregate look or by inspecting random samples. For example, it may help you understand you have some corrupt samples (e.g. a bounding box with area 0), or samples you didn't expect to have (e.g. extreme aspect ratio). In some cases, these outliers may help debug some performance discrepancies (the model can be excused for failing on a zero size bounding box). In more extreme cases, the outlier samples may indicate the presence of samples interfering with the model's training by teaching the model to fit "irrelevant" samples. How Does the Check Work? ------------------------ In order to find outlier labels we use label properties (such as number of bounding boxes, bounding box area, etc.) We use `Interquartile Range `_ to define our upper and lower limit for the properties' values. Which Label Properties Are Used? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ For object detection we have default built-in label properties. For other tasks you have to define your own custom label properties. For the list of the built-in object detection label properties and explanation about custom properties refer to :doc:`vision properties `. .. GENERATED FROM PYTHON SOURCE LINES 47-50 Run the Check ------------- For the example we will load COCO object detection data, and will run the check with the default properties. .. GENERATED FROM PYTHON SOURCE LINES 50-59 .. code-block:: default from deepchecks.vision.checks import LabelPropertyOutliers from deepchecks.vision.datasets.detection.coco import load_dataset train_data = load_dataset(train=True, object_type='VisionData') check = LabelPropertyOutliers() result = check.run(train_data) result .. rst-class:: sphx-glr-script-out .. code-block:: none Validating Input: | | 0/1 [Time: 00:00]/home/runner/work/deepchecks/deepchecks/deepchecks/vision/checks/data_integrity/abstract_property_outliers.py:89: UserWarning: Properties that have class_id as output_type will be skipped. Validating Input: |#####| 1/1 [Time: 00:00] Ingesting Batches: | | 0/2 [Time: 00:00] Ingesting Batches: |##5 | 1/2 [Time: 00:00] Ingesting Batches: |#####| 2/2 [Time: 00:00] Ingesting Batches: |#####| 2/2 [Time: 00:00] Computing Check: | | 0/1 [Time: 00:00] Computing Check: |#####| 1/1 [Time: 00:00] Computing Check: |#####| 1/1 [Time: 00:00] .. raw:: html
Label Property Outliers


.. GENERATED FROM PYTHON SOURCE LINES 60-61 To display the results in an IDE like PyCharm, you can use the following code: .. GENERATED FROM PYTHON SOURCE LINES 61-63 .. code-block:: default # result.show_in_window() .. GENERATED FROM PYTHON SOURCE LINES 64-65 The result will be displayed in a new window. .. GENERATED FROM PYTHON SOURCE LINES 67-82 Observe Graphic Result ^^^^^^^^^^^^^^^^^^^^^^ The check displays a section for each property. In each section we show the number of outliers and the non-outlier property range, and also the images with the lowest and highest values for the property. In addition, if the property returns a value per bounding box, we then show only the relevant bounding box which resulted in the outlier result. For example in property "Bounding Box Area (in pixels)" we can see that 80 outliers were found. Now we can inspect the samples and decide if we wish to ignore these kinds of samples or if we would like the model to be able to support them, in which case we may take a close look into the model's predictions on these samples. Observe Result Value ^^^^^^^^^^^^^^^^^^^^ The check returns CheckResult object with a property 'value' on it which contain the information that was calculated in the check's run. .. GENERATED FROM PYTHON SOURCE LINES 82-86 .. code-block:: default result.value .. rst-class:: sphx-glr-script-out .. code-block:: none {'Bounding Box Area (in pixels)': {'indices': [27, 52, 26, 49, 25, 51, 16, 40, 12, 14, 18, 27, 49, 38, 38, 54, 53, 9, 53, 63, 1, 55, 21, 22, 49, 49, 21, 40, 0, 11, 22, 23, 0, 57, 56, 56, 46, 53, 2, 5, 16, 34, 17, 27, 22, 29, 56, 27, 18, 32, 55, 40, 47, 29, 10, 59, 10, 4, 29, 0, 34, 3, 31, 0, 23, 46, 34, 15, 4, 45, 21, 56, 45, 39, 24, 41, 36, 57, 37, 11], 'lower_limit': 18.331313767712555, 'upper_limit': 39339.21905530155}, 'Number of Bounding Boxes Per Image': {'indices': [30, 21, 43, 52, 33, 37], 'lower_limit': 0.0, 'upper_limit': 20.125}} .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.836 seconds) .. _sphx_glr_download_checks_gallery_vision_data_integrity_plot_label_property_outliers.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_label_property_outliers.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_label_property_outliers.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_