.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "checks_gallery/tabular/integrity/plot_dominant_frequency_change.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_checks_gallery_tabular_integrity_plot_dominant_frequency_change.py: Dominant Frequency Change ************************* This example provides an overview for using and understanding the `Dominant Frequency Change` check. **Structure:** * `What is a Dominant Frequency Change? <#what-is-a-dominant-frequency-change>`__ * `Generate Data <#generate-data>`__ * `Run The Check <#run-the-check>`__ * `Define a Condition <#define-a-condition>`__ What is a Dominant Frequency Change? ==================================== Dominant Frequency Change is a data integrity check which simply checks whether dominant values have increased significantly between test data and train data. Sharp changes in dominant values can indicate a problem with the data collection or data processing pipeline (for example, a sharp incrase in a common null or constant value), and will cause the model to fail to generalize well. This check goal is to catch these issues early in the pipeline. This check compares the dominant values of each feature in the test data to the dominant values of the same feature in the train data. If the ratio of the test to train dominant values is greater than a threshold, the check fails. This threshold can be configured by specifying the `ratio_change_thres` parameter of the check. The Definition of a Dominant Value ---------------------------------- The dominant value is defined as a value that is frequent in data at least more than ``dominance_ratio`` times from the next most frequent value. The ``dominance_ratio`` is a configurable parameter of the check. .. GENERATED FROM PYTHON SOURCE LINES 31-35 .. code-block:: default from deepchecks.tabular.checks.integrity import DominantFrequencyChange from deepchecks.tabular.datasets.classification import iris .. GENERATED FROM PYTHON SOURCE LINES 36-38 Generate data ============= .. GENERATED FROM PYTHON SOURCE LINES 38-40 .. code-block:: default train_ds, test_ds = iris.load_data(data_format='Dataset', as_train_test=True) .. GENERATED FROM PYTHON SOURCE LINES 41-43 Introducing Duplicates in the Test Data ----------------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 43-48 .. code-block:: default # make duplicates in the test data test_ds.data.loc[test_ds.data.index % 2 == 0, 'petal length (cm)'] = 5.1 test_ds.data.loc[test_ds.data.index / 3 > 8, 'sepal width (cm)'] = 2.7 .. GENERATED FROM PYTHON SOURCE LINES 49-51 Run The Check ============= .. GENERATED FROM PYTHON SOURCE LINES 51-55 .. code-block:: default check = DominantFrequencyChange() check.run(test_ds, train_ds) .. raw:: html

Dominant Frequency Change

Check if dominant values have increased significantly between test and reference data.

Additional Outputs
* showing only the top 10 columns, you can change it using n_top_columns param
  Value Train data % Test data % Train data # Test data # P value
Column            
sepal width (cm) 2.70 0.37 0.06 14 7 0.00
petal length (cm) 5.10 0.50 0.05 19 6 0.00


.. GENERATED FROM PYTHON SOURCE LINES 56-58 Define a Condition =================== .. GENERATED FROM PYTHON SOURCE LINES 58-62 .. code-block:: default check = DominantFrequencyChange() check.add_condition_ratio_of_change_not_greater_than(0.1) res = check.run(test_ds, train_ds) res.show(show_additional_outputs=False) .. raw:: html
Dominant Frequency Change


.. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 1.861 seconds) .. _sphx_glr_download_checks_gallery_tabular_integrity_plot_dominant_frequency_change.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_dominant_frequency_change.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_dominant_frequency_change.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_