.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "nlp/auto_tutorials/quickstarts/plot_token_classification.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_nlp_auto_tutorials_quickstarts_plot_token_classification.py: .. _nlp__token_classification_quickstart: Token Classification Quickstart ******************************* Deepchecks NLP tests your models during model development/research and before deploying to production. Using our testing package reduces model failures and saves tests development time. In this quickstart guide, you will learn how to use the deepchecks NLP package to analyze and evaluate token classification tasks. A token classification task is a case in which we wish to give a specific label for each token (usually a word or a part of a word), rather than assigning a class or classes for the text as a whole. For a more complete example showcasing the range of checks and capabilities of the NLP package, refer to our :ref:`Multiclass Quickstart `. We will cover the following steps: 1. `Creating a TextData object and auto calculating properties <#setting-up>`__ 2. `Running checks <#running-checks>`__ To run deepchecks for token classification, you need the following for both your train and test data: 1. Your tokenized text dataset - a list containing lists of strings, each string is a single token within the sample, where a sample can be a sentence, paragraph, document and so on. 2. Your labels - a :ref:`Token Classification ` label. These are not needed for checks that don't require labels (such as the Embeddings Drift check or most data integrity checks), but are needed for many other checks. 3. Your model's predictions (see :ref:`nlp__supported_tasks` for info on supported formats). These are needed only for the model related checks, shown in the `Model Evaluation <#running-checks>`__ check in this guide. If you don't have deepchecks installed yet: .. code:: python import sys !{sys.executable} -m pip install 'deepchecks[nlp]' -U --quiet #--user Some properties calculated by ``deepchecks.nlp`` require additional packages to be installed. You can also install them by running: .. code:: python import sys !{sys.executable} -m pip install 'deepchecks[nlp-properties]' -U --quiet #--user Setting Up ========== Load Data --------- For the purpose of this guide, we'll use a small subset of the `SCIERC `__ dataset: .. GENERATED FROM PYTHON SOURCE LINES 53-61 .. code-block:: default from pprint import pprint from deepchecks.nlp import TextData from deepchecks.nlp.datasets.token_classification import scierc_ner train, test = scierc_ner.load_data(data_format='Dict') pprint(train['text'][0][:10]) pprint(train['label'][0][:10]) .. rst-class:: sphx-glr-script-out .. code-block:: none include_properties and include_embeddings are incompatible with data_format="Dict". loading only original text data ['English', 'is', 'shown', 'to', 'be', 'trans-context-free', 'on', 'the', 'basis', 'of'] ['B-Material', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O'] .. GENERATED FROM PYTHON SOURCE LINES 62-78 The SCIERC dataset is a dataset of scientific articles with annotations for named entities, relations and coreferences. In this example we'll only use the named entity annotations, which are the labels we'll use for our token classification task. We can see that we have the article text itself, and the labels for each token in the text in the :ref:`IOB format `. Create a TextData Object ------------------------- We can now create a :ref:`TextData ` object for the train and test dataframes. This object is used to pass your data to the deepchecks checks. To create a TextData object, the only required argument is the tokenized text itself. In most cases we'll want to pass labels as well, as they are needed in order to calculate many checks. In this example we'll pass the label and define the task type. .. GENERATED FROM PYTHON SOURCE LINES 79-84 .. code-block:: default train = TextData(tokenized_text=train['text'], label=train['label'], task_type='token_classification') test = TextData(tokenized_text=test['text'], label=test['label'], task_type='token_classification') .. GENERATED FROM PYTHON SOURCE LINES 85-93 Calculating Properties ---------------------- Some of deepchecks' checks use properties of the text samples for various calculations. Deepcheck has a wide variety of such properties, some simple and some that rely on external models and are more heavy to run. In order for deepchecks' checks to be able to use the properties, they must be added to the :ref:`TextData ` object, usually by calculating them. You can read more about properties in the :ref:`Property Guide `. .. GENERATED FROM PYTHON SOURCE LINES 93-105 .. code-block:: default # properties can be either calculated directly by Deepchecks # or imported from other sources in appropriate format # device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') # train.calculate_builtin_properties( # include_long_calculation_properties=True, device=device # ) # test.calculate_builtin_properties( # include_long_calculation_properties=True, device=device # ) .. GENERATED FROM PYTHON SOURCE LINES 106-107 In this example though we'll use pre-calculated properties: .. GENERATED FROM PYTHON SOURCE LINES 107-115 .. code-block:: default train_properties, test_properties = scierc_ner.load_properties() train.set_properties(train_properties, categorical_properties=['Language']) test.set_properties(test_properties, categorical_properties=['Language']) train.properties.head(2) .. raw:: html
Language Count URLs Count Email Address Count Unique URLs Count Unique Email Address ... Formality Lexical Density Unique Noun Count Readability Score Average Sentence Length
0 en 0 0 0 0 ... 0.997133 68.38 30.0 34.850 34.0
1 en 0 0 0 0 ... 0.997115 60.47 32.0 54.669 22.0

2 rows × 22 columns



.. GENERATED FROM PYTHON SOURCE LINES 116-130 Running Checks ============== Train Test Performance ---------------------- Once the :ref:`TextData ` object is ready, we can run the checks. We'll start by running the :ref:`TrainTestPerformance ` check, which compares the performance of the model on the train and test sets. For this check, we'll need to pass the model's predictions on the train and test sets, also provided in the format of an IOB annotation per token in the tokenized text. We'll also define a condition for the check with the default threshold value. You can learn more about customizing checks and conditions, as well as defining suites of checks in our :ref:`Customizations Guide ` .. GENERATED FROM PYTHON SOURCE LINES 130-138 .. code-block:: default train_preds, test_preds = scierc_ner.load_precalculated_predictions() from deepchecks.nlp.checks import TrainTestPerformance check = TrainTestPerformance().add_condition_train_test_relative_degradation_less_than() result = check.run(train, test, train_predictions=train_preds, test_predictions=test_preds) result .. raw:: html
Train Test Performance


.. GENERATED FROM PYTHON SOURCE LINES 139-150 We can see that the model performs better on the train set than on the test set, which is expected. We can also note specifically that the recall for class "OtherScientificTerm" has declined significantly on the test set, which is something we might want to investigate further. Embeddings Drift ---------------- The :ref:`EmbeddingsDrift ` check compares the embeddings of the train and test sets. In order to run this check you must have text embeddings loaded to both datasets. You can read more about using embeddings in deepchecks NLP in our :ref:`Embeddings Guide `. In this example, we have the embeddings already pre-calculated: .. GENERATED FROM PYTHON SOURCE LINES 150-157 .. code-block:: default train_embeddings, test_embeddings = scierc_ner.load_embeddings() train.set_embeddings(train_embeddings) test.set_embeddings(test_embeddings) .. GENERATED FROM PYTHON SOURCE LINES 158-160 You can also calculate the embeddings using deepchecks, either using an open-source sentence-transformer or using Open AI’s embedding API. .. GENERATED FROM PYTHON SOURCE LINES 160-164 .. code-block:: default # train.calculate_builtin_embeddings() # test.calculate_builtin_embeddings() .. GENERATED FROM PYTHON SOURCE LINES 165-172 .. code-block:: default from deepchecks.nlp.checks import TextEmbeddingsDrift check = TextEmbeddingsDrift() res = check.run(train, test) res.show() .. rst-class:: sphx-glr-script-out .. code-block:: none n_jobs value -1 overridden to 1 by setting random_state. Use no seed for parallelism. n_jobs value -1 overridden to 1 by setting random_state. Use no seed for parallelism. .. raw:: html
Embeddings Drift


.. GENERATED FROM PYTHON SOURCE LINES 173-184 The check shows the samples from the train and test datasets as points in the 2-dimensional reduced embedding space. We can see some distinct segments - in the upper left corner we can notice (by hovering on the samples and reading the abstracts) that these are papers about computer vision, while the bottom right corner is mostly about Natural Language Processing. We can also see that although there isn't significant drift between the train and test, the training dataset has a bit more samples from the NLP domain, while the test set has more samples from the computer vision domain. .. note:: You can find the full list of available NLP checks in the :mod:`nlp.checks api documentation ֿ `. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 9.089 seconds) .. _sphx_glr_download_nlp_auto_tutorials_quickstarts_plot_token_classification.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_token_classification.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_token_classification.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_