Label Drift#

This notebooks provides an overview for using and understanding the NLP label drift check.

Structure:

What Is Label Drift?
Load Data
Run Check

What Is Label Drift?#

Drift is simply a change in the distribution of data over time, and it is also one of the top reasons why machine learning model’s performance degrades over time.

Label drift is when drift occurs in the label itself.

For more information on drift, please visit our drift guide.

How Deepchecks Detects Label Drift#

This check detects label drift by using univariate measures on the label.

from deepchecks.nlp.datasets.classification import tweet_emotion
from deepchecks.nlp.checks import LabelDrift

Load Data#

For this example, we’ll use the tweet emotion dataset, which is a dataset of tweets labeled by one of four emotions: happiness, anger, sadness and optimism.

train_ds, test_ds = tweet_emotion.load_data()

Let’s see how our data looks like:

train_ds.head()

	text	label	user_age	gender	days_on_platform	user_region
0	No but that's so cute. Atsu was probably shy a...	happiness	24.97	Male	2729	Middle East/Africa
1	Rooneys fucking untouchable isn't he? Been fuc...	anger	21.66	Male	1376	Asia Pacific
2	Tiller and breezy should do a collab album. Ra...	happiness	37.29	Female	3853	Americas
3	@user broadband is shocking regretting signing...	anger	15.39	Female	1831	Europe
4	@user Look at those teef! #growl	anger	54.37	Female	4619	Europe

Run Check#

As there’s natural drift in this dataset, we can expect to see some drift in the “optimism” label:

check = LabelDrift()
result = check.run(train_dataset=train_ds, test_dataset=test_ds)
result

Label Drift

Total running time of the script: (0 minutes 0.118 seconds)

Gallery generated by Sphinx-Gallery

Train Test Validation

Embeddings Drift