.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "checks_gallery/vision/data_integrity/plot_label_property_outliers.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_checks_gallery_vision_data_integrity_plot_label_property_outliers.py: .. _plot_vision_label_property_outliers: Label Property Outliers ======================= This notebooks provides an overview for using and understanding the label property outliers check, used to detect outliers in simple label properties in a dataset. **Structure:** * `Why Check for Label Outliers? <#why-check-for-label-outliers>`__ * `How Does the Check Work? <#how-does-the-check-work>`__ * `Which Label Properties Are Used? <#which-label-properties-are-used>`__ * `Run the Check <#run-the-check>`__ Why Check for Label Outliers? ----------------------------- Examining outliers may help you gain insights that you couldn't have reached from taking an aggregate look or by inspecting random samples. For example, it may help you understand you have some corrupt samples (e.g. a bounding box with area 0), or samples you didn't expect to have (e.g. extreme aspect ratio). In some cases, these outliers may help debug some performance discrepancies (the model can be excused for failing on a zero size bounding box). In more extreme cases, the outlier samples may indicate the presence of samples interfering with the model's training by teaching the model to fit "irrelevant" samples. How Does the Check Work? ------------------------ In order to find outlier labels we use label properties (such as number of bounding boxes, bounding box area, etc.) We use `Interquartile Range `_ to define our upper and lower limit for the properties' values. Which Label Properties Are Used? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ For object detection we have default built-in label properties. For other tasks you have to define your own custom label properties. For the list of the built-in object detection label properties and explanation about custom properties refer to :doc:`vision properties `. .. GENERATED FROM PYTHON SOURCE LINES 47-56 Run the Check ------------- For the example we will load COCO object detection data, and will run the check with the default properties. .. note:: In this example, we use the pytorch version of the coco dataset and model. In order to run this example using tensorflow, please change the import statements to:: from deepchecks.vision.datasets.detection.coco_tensorflow import load_dataset .. GENERATED FROM PYTHON SOURCE LINES 56-65 .. code-block:: default from deepchecks.vision.checks import LabelPropertyOutliers from deepchecks.vision.datasets.detection.coco_torch import load_dataset train_data = load_dataset(train=True, object_type='VisionData') check = LabelPropertyOutliers() result = check.run(train_data) result .. rst-class:: sphx-glr-script-out .. code-block:: none /home/runner/work/deepchecks/deepchecks/deepchecks/vision/checks/data_integrity/abstract_property_outliers.py:100: UserWarning: Properties that have class_id as output_type will be skipped. Processing Batches: | | 0/1 [Time: 00:00] Processing Batches: |#####| 1/1 [Time: 00:00] Processing Batches: |#####| 1/1 [Time: 00:00] Computing Check: | | 0/1 [Time: 00:00] Computing Check: |#####| 1/1 [Time: 00:00] Computing Check: |#####| 1/1 [Time: 00:00] .. raw:: html
Label Property Outliers


.. GENERATED FROM PYTHON SOURCE LINES 66-67 To display the results in an IDE like PyCharm, you can use the following code: .. GENERATED FROM PYTHON SOURCE LINES 67-69 .. code-block:: default # result.show_in_window() .. GENERATED FROM PYTHON SOURCE LINES 70-71 The result will be displayed in a new window. .. GENERATED FROM PYTHON SOURCE LINES 73-88 Observe Graphic Result ^^^^^^^^^^^^^^^^^^^^^^ The check displays a section for each property. In each section we show the number of outliers and the non-outlier property range, and also the images with the lowest and highest values for the property. In addition, if the property returns a value per bounding box, we then show only the relevant bounding box which resulted in the outlier result. For example in property "Bounding Box Area (in pixels)" we can see that 80 outliers were found. Now we can inspect the samples and decide if we wish to ignore these kinds of samples or if we would like the model to be able to support them, in which case we may take a close look into the model's predictions on these samples. Observe Result Value ^^^^^^^^^^^^^^^^^^^^ The check returns CheckResult object with a property 'value' on it which contain the information that was calculated in the check's run. .. GENERATED FROM PYTHON SOURCE LINES 88-92 .. code-block:: default result.value .. rst-class:: sphx-glr-script-out .. code-block:: none {'Bounding Box Area (in pixels)': {'outliers_identifiers': array(['0', '1', '2', '3', '4', '5', '9', '10', '11', '12', '14', '15', '16', '17', '18', '21', '22', '23', '24', '25', '26', '27', '29', '31', '0', '2', '4', '5', '6', '7', '8', '9', '13', '14', '15', '17', '19', '20', '21', '22', '23', '24', '25', '27', '31'], dtype='` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_label_property_outliers.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_