.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "nlp/auto_checks/train_test_validation/plot_property_drift.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_nlp_auto_checks_train_test_validation_plot_property_drift.py: .. _nlp__property_drift: NLP Property Drift ****************** This notebooks provides an overview for using and understanding the nlp property drift check. **Structure:** * `Calculating Drift for Text Data <#calculating-drift-for-text-data>`__ * `Prepare data <#prepare-data>`__ * `Run the check <#run-the-check>`__ * `Define a condition <#define-a-condition>`__ * `Check Parameters <#check-parameters>`__ Calculating Drift for Text Data ================================= What is Drift? ---------------- Drift is simply a change in the distribution of data over time, and it is also one of the top reasons why machine learning model's performance degrades over time. For more information on drift, please visit our :ref:`drift_user_guide`. How Deepchecks Detects Drift in NLP Data ----------------------------------------- This check detects drift by in NLP Data by calculated :ref:`univariate drift measures ` for each of the :ref:`text property ` (such as text length, language etc.) that are present in the train and test datasets. This check is easy to run (once the properties are calculated once per dataset) and is useful for detecting easily explainable changes in the data. For example, if you have started to use new data sources that contain samples in a new language, this check will detect it and show you a high drift score for the language property. Which NLP Properties Are Used? ------------------------------- By default the checks use the built-in text properties, and it's also possible to replace the default properties with custom ones. For the list of the built-in text properties and explanation about custom properties refer to :ref:`NLP properties `. Prepare data ============= .. GENERATED FROM PYTHON SOURCE LINES 51-60 .. code-block:: default from deepchecks.nlp.datasets.classification.tweet_emotion import load_data train_dataset, test_dataset = load_data() # # Calculate properties, commented out because it takes a short while to run # train_dataset.calculate_builtin_properties(include_long_calculation_properties=True) # test_dataset.calculate_builtin_properties(include_long_calculation_properties=True) .. GENERATED FROM PYTHON SOURCE LINES 61-63 Run the check ============= .. GENERATED FROM PYTHON SOURCE LINES 63-68 .. code-block:: default from deepchecks.nlp.checks import PropertyDrift check_result = PropertyDrift().run(train_dataset, test_dataset) check_result .. raw:: html
Properties Drift


.. GENERATED FROM PYTHON SOURCE LINES 69-73 We can see that there isn't any significant drift in the data. We can see some slight increase in the formality of the text samples in the test dataset. To display the results in an IDE like PyCharm, you can use the following code: .. GENERATED FROM PYTHON SOURCE LINES 73-75 .. code-block:: default # check_result.show_in_window() .. GENERATED FROM PYTHON SOURCE LINES 76-77 The result will be displayed in a new window. .. GENERATED FROM PYTHON SOURCE LINES 79-82 Observe the check’s output -------------------------- The result value is a dict that contains drift score and method used for each text property. .. GENERATED FROM PYTHON SOURCE LINES 82-85 .. code-block:: default check_result.value .. rst-class:: sphx-glr-script-out .. code-block:: none {'Fluency': {'Drift score': 0.054627254944577264, 'Method': 'Kolmogorov-Smirnov', 'Importance': None}, 'Max Word Length': {'Drift score': 0.04743959252714447, 'Method': 'Kolmogorov-Smirnov', 'Importance': None}, 'Subjectivity': {'Drift score': 0.034508944180376644, 'Method': 'Kolmogorov-Smirnov', 'Importance': None}, '% Special Characters': {'Drift score': 0.02332838796858905, 'Method': 'Kolmogorov-Smirnov', 'Importance': None}, 'Average Word Length': {'Drift score': 0.05351275242622111, 'Method': 'Kolmogorov-Smirnov', 'Importance': None}, 'Language': {'Drift score': 0.009166684961611582, 'Method': "Cramer's V", 'Importance': None}, 'Sentiment': {'Drift score': 0.04037496574468685, 'Method': 'Kolmogorov-Smirnov', 'Importance': None}, 'Toxicity': {'Drift score': 0.023840752955406663, 'Method': 'Kolmogorov-Smirnov', 'Importance': None}, 'Formality': {'Drift score': 0.08043676705442104, 'Method': 'Kolmogorov-Smirnov', 'Importance': None}, 'Text Length': {'Drift score': 0.029349196299481184, 'Method': 'Kolmogorov-Smirnov', 'Importance': None}} .. GENERATED FROM PYTHON SOURCE LINES 86-90 Define a condition ================== We can define a condition that make sure that nlp properties drift scores do not exceed allowed threshold. .. GENERATED FROM PYTHON SOURCE LINES 90-98 .. code-block:: default check_result = ( PropertyDrift() .add_condition_drift_score_less_than(0.001) .run(train_dataset, test_dataset) ) check_result.show(show_additional_outputs=False) .. raw:: html
Properties Drift


.. GENERATED FROM PYTHON SOURCE LINES 99-108 Check Parameters ================== The Property Drift Check can define a list of properties to use for the drift check, or a list to exclude using the ``properties`` and ``ignore_properties`` parameters. On top of that the Property Drift Check supports several parameters pertaining to the way drift is calculated and displayed. Information about the most relevant of them can be found in the :ref:`drift_user_guide`. .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 1.592 seconds) .. _sphx_glr_download_nlp_auto_checks_train_test_validation_plot_property_drift.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_property_drift.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_property_drift.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_