NLP Property Drift {#nlp__property_drift}
==================

This notebooks provides an overview for using and understanding the nlp
property drift check.

**Structure:**

-   [Calculating Drift for Text Data](#calculating-drift-for-text-data)
-   [Prepare data](#prepare-data)
-   [Run the check](#run-the-check)
-   [Define a condition](#define-a-condition)
-   [Check Parameters](#check-parameters)

Calculating Drift for Text Data
-------------------------------

### What is Drift?

Drift is simply a change in the distribution of data over time, and it
is also one of the top reasons why machine learning model\'s performance
degrades over time.

For more information on drift, please visit our
`drift_user_guide`{.interpreted-text role="ref"}.

### How Deepchecks Detects Drift in NLP Data

This check detects drift by in NLP Data by calculated
`univariate drift measures <drift_detection_by_univariate_measure>`{.interpreted-text
role="ref"} for each of the
`text property <nlp__properties_guide>`{.interpreted-text role="ref"}
(such as text length, language etc.) that are present in the train and
test datasets.

This check is easy to run (once the properties are calculated once per
dataset) and is useful for detecting easily explainable changes in the
data. For example, if you have started to use new data sources that
contain samples in a new language, this check will detect it and show
you a high drift score for the language property.

### Which NLP Properties Are Used?

By default the checks uses the properties that where calculated for the
train and test datasets, which by default are the built-in text
properties. It\'s also possible to replace the default properties with
custom ones. For the list of the built-in text properties and
explanation about custom properties refer to `NLP properties
<nlp__properties_guide>`{.interpreted-text role="ref"}.

::: {.note}
::: {.title}
Note
:::

If a property was not calculated for a sample (for example, if it
applies only to English samples and the sample is in another language),
it will contain a nan value and will be ignored when calculating the
drift.
:::

Prepare data
------------


In [None]:
from deepchecks.nlp.datasets.classification.tweet_emotion import load_data

train_dataset, test_dataset = load_data()

# # Calculate properties, commented out because it takes a short while to run
# train_dataset.calculate_builtin_properties(include_long_calculation_properties=True)
# test_dataset.calculate_builtin_properties(include_long_calculation_properties=True)

Run the check
=============


In [None]:
from deepchecks.nlp.checks import PropertyDrift
check_result = PropertyDrift().run(train_dataset, test_dataset)
check_result

We can see that there isn\'t any significant drift in the data. We can
see some slight increase in the formality of the text samples in the
test dataset.

To display the results in an IDE like PyCharm, you can use the following
code:


In [None]:
#  check_result.show_in_window()

The result will be displayed in a new window.


Observe the check's output
==========================

The result value is a dict that contains drift score and method used for
each text property.


In [None]:
check_result.value

Define a condition
==================

We can define a condition that make sure that nlp properties drift
scores do not exceed allowed threshold.


In [None]:
check_result = (
    PropertyDrift()
    .add_condition_drift_score_less_than(0.001)
    .run(train_dataset, test_dataset)
)
check_result.show(show_additional_outputs=False)

Check Parameters
================

The Property Drift Check can define a list of properties to use for the
drift check, or a list to exclude using the `properties` and
`ignore_properties` parameters.

On top of that the Property Drift Check supports several parameters
pertaining to the way drift is calculated and displayed. Information
about the most relevant of them can be found in the
`drift_user_guide`{.interpreted-text role="ref"}.
