Class Imbalance#

This notebook provides an overview for using and understanding the Class Imbalance check.

Structure:

What is the Class Imbalance check#

The ClassImbalance check produces a distribution of the target variable. An indication for an imbalanced dataset is an uneven distribution in label classes.

An imbalanced dataset poses its own challenges, namely learning the characteristics of the minority label, scarce minority instances to train on (or test for) and defining the right evaluation metric.

Albeit, there are many techniques to address these challenges, including artificially increasing the minority sample size (by over-sampling or using SMOTE), drop instances from the majority class (under-sampling), using regularization, and adjusting the label classes weights.

Imports#

from deepchecks.tabular import Dataset
from deepchecks.tabular.checks import ClassImbalance
from deepchecks.tabular.datasets.classification import lending_club

Generate data#

df = lending_club.load_data(data_format='Dataframe', as_train_test=False)
dataset = Dataset(df, label='loan_status', features=['id', 'loan_amnt'], cat_features=[])

Run the check#

Class Imbalance


Skew the target variable and run the check#

df.loc[df.sample(frac=0.7, random_state=0).index, 'loan_status'] = 1
dataset = Dataset(df, label='loan_status', features=['id', 'loan_amnt'], cat_features=[])
ClassImbalance().run(dataset)
Class Imbalance


Define a condition#

A manually defined ratio between the labels can also be set:

ClassImbalance().add_condition_class_ratio_less_than(0.15).run(dataset)
Class Imbalance


Total running time of the script: (0 minutes 1.783 seconds)

Gallery generated by Sphinx-Gallery