Note
Go to the end to download the full example code
Label Drift#
This notebooks provides an overview for using and understanding the NLP label drift check.
Structure:
What Is Label Drift?#
Drift is simply a change in the distribution of data over time, and it is also one of the top reasons why machine learning model’s performance degrades over time.
Label drift is when drift occurs in the label itself.
For more information on drift, please visit our drift guide.
How Deepchecks Detects Label Drift#
This check detects label drift by using univariate measures on the label.
from deepchecks.nlp.datasets.classification import tweet_emotion
from deepchecks.nlp.checks import LabelDrift
Load Data#
For this example, we’ll use the tweet emotion dataset, which is a dataset of tweets labeled by one of four emotions: happiness, anger, sadness and optimism.
Let’s see how our data looks like:
Run Check#
As there’s natural drift in this dataset, we can expect to see some drift in the “optimism” label:
check = LabelDrift()
result = check.run(train_dataset=train_ds, test_dataset=test_ds)
result
Total running time of the script: ( 0 minutes 0.210 seconds)