Supported Tasks and Formats#

Some checks, mainly the ones related to model evaluation, require labels and model predictions in order to run. In the deepchecks nlp package, predictions are passed into the suite / check run method as pre-computed predictions only (passing a fitted model is currently not supported).

Supported Task Types#

Deepchecks currently supports two NLP task types:

  • Text Classification: Text classification is any NLP task in which a whole body of text (ranging from a sentence to a document) is assigned a class (in the binary/multiclass case) or a certain set of classes (in the multilabel case). In both the binary, the multiclass and the multilabel case the class “belongs” / “classifies” the whole text sample. Examples for such tasks are:

    • Sentiment Analysis

    • Topic Extraction

    • Harmful content detection

  • Token Classification: Token Classification is any NLP task in which each word (or to be more accurate, token) in the text sample is assigned a class of its own. In many cases most tokens will belong to a “background” class, allowing the model to focus on the interesting tokens. Examples for such tasks are:

    • Named Entity Recognition,

    • Part-of-speech annotation (in which all tokens have a non-background class).

Supported Labels and Predictions Format#

While labels are passed when constructing the TextData object, predictions are passed separately to the run() method of the check / suite. Labels and predictions must be in the format detailed in this section, according to the task type.

Text Classification#

Label Format#

For text classification the accepted label format differs between multilabel and single label cases. For single label data, the label should be passed as a sequence of labels, with one entry per sample that can be either a string or an integer. For multilabel data, the label should be passed as a sequence of sequences, with the sequence for each sample being a binary vector, representing the presence of the i-th label in that sample.

>>> text_classification_label_multiclass = ['class_0', 'class_0', 'class_1', 'class_2']
>>> text_classification_label_multilabel = [[0, 0, 1], [0, 1, 1], [1, 0, 1], [0, 0, 0]]

Note

For multilabel tasks, in order for deepchecks to use string names for the different classes (rather than just noting the classes id in the label matrix) you may pass a list of the class names to the classes argument of the TextData constructor method. This list of names, having the same length as the number of rows in the label matrix, will be used to name the multilabel classes throughout deepchecks.

Prediction Format#

Note

Class probabilities (and for multilabel tasks, also predictions) are always provided as a matrix of (n_samples, n_classes). In order to understand which column corresponds to each of the class names present in the labels and the predictions, this matrix must follow the convention that the i-th element represents the class probabilities for the class in the i-th position in the sorted array of class names. The sorted array of class names is the result of sorting the set of all class names present in the label and prediction, namely sorted(list(set(y_true).union(set(y_pred)))).

Single Class Predictions#
  • predictions - A sequence of class names or indices with one entry per sample, matching the set of classes present in the labels.

  • probabilities - A sequence of sequences with each element containing the vector of class probabilities for each sample. Each such vector should have one probability per class according to the class (sorted) order, and the probabilities should sum to 1 for each sample.

>>> predictions = ['class_1', 'class_1', 'class_2']
>>> # Note that even in the binary case the probability must be specified for each class, as is the case in this example
>>> probabilities = [[0.2, 0.8], [0.5, 0.5], [0.3, 0.7]]
Multilabel Predictions#
  • predictions - A sequence of sequences with each element containing a binary vector denoting the presence of the i-th class for the given sample. Each such vector should have one binary indicator per class according to the class (sorted) order. More than one class can be present for each sample.

  • probabilities - A sequence of sequences with each element containing the vector of class probabilities for each sample. Each such vector should have one probability per class according to the class (sorted) order, and the probabilities should range from 0 to 1 for each sample, but are not required to sum to 1.

>>> predictions = [[0, 0, 1], [0, 1, 1]]
>>> probabilities = [[0.2, 0.3, 0.8], [0.4, 0.9, 0.6]]

Token Classification#

For token classification tasks labels and predictions are given in any IOB format supported by the seqeval library. The label should be passed as a sequence of sequences, with the inner sequence containing the appropriate IOB annotation for each token in the sample.

To let deepchecks know what are the individual tokens in the text sample, it’s highly recommended that you pass a list of the tokens to the tokenized_text argument of the TextData constructor method. Otherwise, deepchecks will attempt to tokenize the text samples (given to the text argument) by splitting them by spaces.

Formats - Example#

The following label and prediction examples are given for the following text sample:

>>> tokenized_text = [['Mary', 'had', 'a', 'little', 'lamb'],
>>>                  ['Mary', 'lives', 'in', 'London', 'and', 'Paris']]
Label Format#

Here is an example of IOB annotation for the above text sample:

>>> token_classification_label = [['B-PER', 'O', 'O', 'O', 'O'], ['B-PER', 'O', 'O', 'B-GEO', 'O', 'B-GEO']]
Prediction Format#
  • predictions - Predictions for token classification should be given in the exact same format as the labels.

  • probabilities - No probabilities should be passed for Token Classification tasks. Passing probabilities will result in an error.

Example for predictions (confusing the lamb with a person):

>>> predictions = [['B-PER', 'O', 'O', 'O', 'B-PER'], ['B-PER', 'O', 'O', 'B-GEO', 'O', 'B-GEO']]