.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "checks_gallery/tabular/data_integrity/plot_outlier_sample_detection.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_checks_gallery_tabular_data_integrity_plot_outlier_sample_detection.py: Outlier Sample Detection *************** This notebooks provides an overview for using and understanding the Outlier Sample Detection check. **Structure:** * `How deepchecks detects outliers <#How-deepchecks-detects-outliers>`__ * `Prepare data <#prepare-data>`__ * `Run the check <#run-the-check>`__ * `Define a condition <#define-a-condition>`__ How deepchecks detects outliers ======================== Outlier Sample Detection searches for outliers samples (jointly across all features) using the LoOP algorithm. The LoOP algorithm is a robust method for detecting outliers in a dataset across multiple variables by comparing the density in the area of a sample with the densities in the areas of its nearest neighbors (see `link `_ for further details). LoOP relies on a distance matrix. In our implementation we use the Gower distance that averages the distances per feature between samples. For numeric features it calculates the absolute distance divided by the range of the feature and for categorical features it is an indicator whether the values are the same (see `link `_ for further details). .. GENERATED FROM PYTHON SOURCE LINES 29-31 Imports ======= .. GENERATED FROM PYTHON SOURCE LINES 31-38 .. code-block:: default import pandas as pd from sklearn.datasets import load_iris from deepchecks.tabular import Dataset from deepchecks.tabular.checks import OutlierSampleDetection .. GENERATED FROM PYTHON SOURCE LINES 39-41 Prepare data ========= .. GENERATED FROM PYTHON SOURCE LINES 41-45 .. code-block:: default iris = pd.DataFrame(load_iris().data) iris.describe() .. raw:: html
0 1 2 3
count 150.000000 150.000000 150.000000 150.000000
mean 5.843333 3.057333 3.758000 1.199333
std 0.828066 0.435866 1.765298 0.762238
min 4.300000 2.000000 1.000000 0.100000
25% 5.100000 2.800000 1.600000 0.300000
50% 5.800000 3.000000 4.350000 1.300000
75% 6.400000 3.300000 5.100000 1.800000
max 7.900000 4.400000 6.900000 2.500000


.. GENERATED FROM PYTHON SOURCE LINES 46-47 Add an outlier: .. GENERATED FROM PYTHON SOURCE LINES 47-53 .. code-block:: default outlier_sample = [1, 10, 50, 100] iris.loc[len(iris.index)] = outlier_sample print(iris.tail()) modified_iris = Dataset(iris, cat_features=[]) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none 0 1 2 3 146 6.3 2.5 5.0 1.9 147 6.5 3.0 5.2 2.0 148 6.2 3.4 5.4 2.3 149 5.9 3.0 5.1 1.8 150 1.0 10.0 50.0 100.0 .. GENERATED FROM PYTHON SOURCE LINES 54-57 Run the Check ============= We define the nearest_neighbors_percent and the extent parameters for the LoOP algorithm. .. GENERATED FROM PYTHON SOURCE LINES 57-61 .. code-block:: default check = OutlierSampleDetection(nearest_neighbors_percent=0.01, extent_parameter=3) check.run(modified_iris) .. raw:: html
Outlier Sample Detection


.. GENERATED FROM PYTHON SOURCE LINES 62-65 Define a condition ================== Now, we define a condition that enforces that the ratio of outlier samples in out dataset is below 0.001. .. GENERATED FROM PYTHON SOURCE LINES 65-68 .. code-block:: default check = OutlierSampleDetection() check.add_condition_outlier_ratio_not_greater_than(max_outliers_ratio=0.001, outlier_score_threshold=0.9) check.run(modified_iris) .. raw:: html
Outlier Sample Detection


.. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.200 seconds) .. _sphx_glr_download_checks_gallery_tabular_data_integrity_plot_outlier_sample_detection.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_outlier_sample_detection.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_outlier_sample_detection.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_