Class Imbalance#

This notebook provides an overview for using and understanding the Class Imbalance check.

Structure:

What is the Class Imbalance check
Generate data
Run the check
Define a condition

What is the Class Imbalance check#

The ClassImbalance check produces a distribution of the target variable. An indication for an imbalanced dataset is an uneven distribution in label classes.

An imbalanced dataset poses its own challenges, namely learning the characteristics of the minority label, scarce minority instances to train on (or test for) and defining the right evaluation metric.

Albeit, there are many techniques to address these challenges, including artificially increasing the minority sample size (by over-sampling or using SMOTE), drop instances from the majority class (under-sampling), using regularization, and adjusting the label classes weights.

Imports#

from deepchecks.tabular import Dataset
from deepchecks.tabular.checks import ClassImbalance
from deepchecks.tabular.datasets.classification import lending_club

Generate data#

df = lending_club.load_data(data_format='Dataframe', as_train_test=False)
dataset = Dataset(df, label='loan_status', features=['id', 'loan_amnt'], cat_features=[])

Run the check#

ClassImbalance().run(dataset)

Class Imbalance

Skew the target variable and run the check#

df.loc[df.sample(frac=0.7, random_state=0).index, 'loan_status'] = 1
dataset = Dataset(df, label='loan_status', features=['id', 'loan_amnt'], cat_features=[])
ClassImbalance().run(dataset)

Class Imbalance

Define a condition#

A manually defined ratio between the labels can also be set:

ClassImbalance().add_condition_class_ratio_less_than(0.15).run(dataset)

Class Imbalance

Conditions Summary

Status	Condition	More Info
✓	The ratio between least frequent label to most frequent label is less than or equal 0.15	The ratio between least to most frequent label is 0.09

Total running time of the script: (0 minutes 1.300 seconds)

Gallery generated by Sphinx-Gallery

Columns Info

Outlier Sample Detection

Class Imbalance#

What is the Class Imbalance check#

Imports#

Generate data#

Run the check#

Class Imbalance

Additional Outputs

Skew the target variable and run the check#

Class Imbalance

Additional Outputs

Define a condition#

Class Imbalance

Conditions Summary

Additional Outputs