.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "nlp/auto_checks/train_test_validation/plot_property_drift.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_nlp_auto_checks_train_test_validation_plot_property_drift.py: .. _nlp__property_drift: NLP Property Drift ****************** This notebooks provides an overview for using and understanding the nlp property drift check. **Structure:** * `Calculating Drift for Text Data <#calculating-drift-for-text-data>`__ * `Prepare data <#prepare-data>`__ * `Run the check <#run-the-check>`__ * `Define a condition <#define-a-condition>`__ * `Check Parameters <#check-parameters>`__ Calculating Drift for Text Data ================================= What is Drift? ---------------- Drift is simply a change in the distribution of data over time, and it is also one of the top reasons why machine learning model's performance degrades over time. For more information on drift, please visit our :ref:`drift_user_guide`. How Deepchecks Detects Drift in NLP Data ----------------------------------------- This check detects drift by in NLP Data by calculated :ref:`univariate drift measures ` for each of the :ref:`text property ` (such as text length, language etc.) that are present in the train and test datasets. This check is easy to run (once the properties are calculated once per dataset) and is useful for detecting easily explainable changes in the data. For example, if you have started to use new data sources that contain samples in a new language, this check will detect it and show you a high drift score for the language property. Which NLP Properties Are Used? ------------------------------- By default the checks uses the properties that where calculated for the train and test datasets, which by default are the built-in text properties. It's also possible to replace the default properties with custom ones. For the list of the built-in text properties and explanation about custom properties refer to :ref:`NLP properties `. .. note:: If a property was not calculated for a sample (for example, if it applies only to English samples and the sample is in another language), it will contain a nan value and will be ignored when calculating the drift. Prepare data ============= .. GENERATED FROM PYTHON SOURCE LINES 57-66 .. code-block:: default from deepchecks.nlp.datasets.classification.tweet_emotion import load_data train_dataset, test_dataset = load_data() # # Calculate properties, commented out because it takes a short while to run # train_dataset.calculate_builtin_properties(include_long_calculation_properties=True) # test_dataset.calculate_builtin_properties(include_long_calculation_properties=True) .. GENERATED FROM PYTHON SOURCE LINES 67-69 Run the check ============= .. GENERATED FROM PYTHON SOURCE LINES 69-74 .. code-block:: default from deepchecks.nlp.checks import PropertyDrift check_result = PropertyDrift().run(train_dataset, test_dataset) check_result .. raw:: html
Property Drift


.. GENERATED FROM PYTHON SOURCE LINES 75-79 We can see that there isn't any significant drift in the data. We can see some slight increase in the formality of the text samples in the test dataset. To display the results in an IDE like PyCharm, you can use the following code: .. GENERATED FROM PYTHON SOURCE LINES 79-81 .. code-block:: default # check_result.show_in_window() .. GENERATED FROM PYTHON SOURCE LINES 82-83 The result will be displayed in a new window. .. GENERATED FROM PYTHON SOURCE LINES 85-88 Observe the check’s output -------------------------- The result value is a dict that contains drift score and method used for each text property. .. GENERATED FROM PYTHON SOURCE LINES 88-91 .. code-block:: default check_result.value .. rst-class:: sphx-glr-script-out .. code-block:: none {'Max Word Length': {'Drift score': 0.04743959252714447, 'Method': 'Kolmogorov-Smirnov', 'Importance': None}, 'Fluency': {'Drift score': 0.054627254944577264, 'Method': 'Kolmogorov-Smirnov', 'Importance': None}, 'Average Word Length': {'Drift score': 0.05351275242622111, 'Method': 'Kolmogorov-Smirnov', 'Importance': None}, 'Language': {'Drift score': 0.009166684961611582, 'Method': "Cramer's V", 'Importance': None}, 'Subjectivity': {'Drift score': 0.034508944180376644, 'Method': 'Kolmogorov-Smirnov', 'Importance': None}, 'Text Length': {'Drift score': 0.029349196299481184, 'Method': 'Kolmogorov-Smirnov', 'Importance': None}, 'Toxicity': {'Drift score': 0.023840752955406663, 'Method': 'Kolmogorov-Smirnov', 'Importance': None}, 'Sentiment': {'Drift score': 0.04037496574468685, 'Method': 'Kolmogorov-Smirnov', 'Importance': None}, 'Formality': {'Drift score': 0.08043676705442104, 'Method': 'Kolmogorov-Smirnov', 'Importance': None}, '% Special Characters': {'Drift score': 0.02332838796858905, 'Method': 'Kolmogorov-Smirnov', 'Importance': None}} .. GENERATED FROM PYTHON SOURCE LINES 92-96 Define a condition ================== We can define a condition that make sure that nlp properties drift scores do not exceed allowed threshold. .. GENERATED FROM PYTHON SOURCE LINES 96-104 .. code-block:: default check_result = ( PropertyDrift() .add_condition_drift_score_less_than(0.001) .run(train_dataset, test_dataset) ) check_result.show(show_additional_outputs=False) .. raw:: html
Property Drift


.. GENERATED FROM PYTHON SOURCE LINES 105-114 Check Parameters ================== The Property Drift Check can define a list of properties to use for the drift check, or a list to exclude using the ``properties`` and ``ignore_properties`` parameters. On top of that the Property Drift Check supports several parameters pertaining to the way drift is calculated and displayed. Information about the most relevant of them can be found in the :ref:`drift_user_guide`. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 1.297 seconds) .. _sphx_glr_download_nlp_auto_checks_train_test_validation_plot_property_drift.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_property_drift.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_property_drift.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_