.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "tabular/auto_checks/data_integrity/plot_class_imbalance.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_tabular_auto_checks_data_integrity_plot_class_imbalance.py: .. _tabular__class_imbalance: Class Imbalance *************** This notebook provides an overview for using and understanding the Class Imbalance check. **Structure:** * `What is the Class Imbalance check <#what-is-the-class-imbalance-check>`__ * `Generate data <#generate-data>`__ * `Run the check <#run-the-check>`__ * `Define a condition <#define-a-condition>`__ What is the Class Imbalance check ==================================== The ``ClassImbalance`` check produces a distribution of the target variable. An indication for an imbalanced dataset is an uneven distribution in label classes. An imbalanced dataset poses its own challenges, namely learning the characteristics of the minority label, scarce minority instances to train on (or test for) and defining the right evaluation metric. Albeit, there are many techniques to address these challenges, including artificially increasing the minority sample size (by over-sampling or using SMOTE), drop instances from the majority class (under-sampling), using regularization, and adjusting the label classes weights. .. GENERATED FROM PYTHON SOURCE LINES 34-36 Imports ========= .. GENERATED FROM PYTHON SOURCE LINES 36-40 .. code-block:: default from deepchecks.tabular import Dataset from deepchecks.tabular.checks import ClassImbalance from deepchecks.tabular.datasets.classification import lending_club .. GENERATED FROM PYTHON SOURCE LINES 41-43 Generate data =============== .. GENERATED FROM PYTHON SOURCE LINES 43-47 .. code-block:: default df = lending_club.load_data(data_format='Dataframe', as_train_test=False) dataset = Dataset(df, label='loan_status', features=['id', 'loan_amnt'], cat_features=[]) .. GENERATED FROM PYTHON SOURCE LINES 48-50 Run the check ================= .. GENERATED FROM PYTHON SOURCE LINES 50-53 .. code-block:: default ClassImbalance().run(dataset) .. raw:: html
Class Imbalance


.. GENERATED FROM PYTHON SOURCE LINES 54-56 Skew the target variable and run the check -------------------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 56-62 .. code-block:: default df.loc[df.sample(frac=0.7, random_state=0).index, 'loan_status'] = 1 dataset = Dataset(df, label='loan_status', features=['id', 'loan_amnt'], cat_features=[]) ClassImbalance().run(dataset) .. raw:: html
Class Imbalance


.. GENERATED FROM PYTHON SOURCE LINES 63-66 Define a condition ==================== A manually defined ratio between the labels can also be set: .. GENERATED FROM PYTHON SOURCE LINES 66-68 .. code-block:: default ClassImbalance().add_condition_class_ratio_less_than(0.15).run(dataset) .. raw:: html
Class Imbalance


.. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 1.662 seconds) .. _sphx_glr_download_tabular_auto_checks_data_integrity_plot_class_imbalance.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_class_imbalance.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_class_imbalance.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_