model_evaluation#

model_evaluation(n_samples: Optional[int] = None, random_state: int = 42, **kwargs) → Suite[source]#

Suite for evaluating the model’s performance over different metrics, segments, error analysis, examining overfitting, comparing to baseline, and more.

Parameters

n_samplesint , default: 1_000_000: number of samples to use for checks that sample data. If none, use the default n_samples per check.
random_stateint, default: 42: random seed for all checks.
**kwargsdict: additional arguments to pass to the checks.

Returns

Suite: A suite for evaluating the model’s performance.

Examples

>>> from deepchecks.nlp.suites import model_evaluation
>>> suite = model_evaluation(n_samples=1_000_000)
>>> result = suite.run()
>>> result.show()

run(self, train_dataset: Optional[TextData] = None, test_dataset: Optional[TextData] = None, with_display: bool = True, train_predictions: Optional[Union[Sequence[int], Sequence[str], Sequence[Sequence[int]], Sequence[Sequence[str]]]] = None, test_predictions: Optional[Union[Sequence[int], Sequence[str], Sequence[Sequence[int]], Sequence[Sequence[str]]]] = None, train_probabilities: Optional[Sequence[Sequence[float]]] = None, test_probabilities: Optional[Sequence[Sequence[float]]] = None, model_classes: Optional[List] = None, random_state: int = 42) → SuiteResult#

Run all checks.

Parameters

train_dataset: Union[TextData, None] , default: None: TextData object, representing data an estimator was fitted on
test_dataset: Union[TextData, None] , default: None: TextData object, representing data an estimator predicts on
with_displaybool , default: True: flag that determines if checks will calculate display (redundant in some checks).
train_predictions: Union[TTextPred, None] , default: None: predictions on train dataset
test_predictions: Union[TTextPred, None] , default: None: predictions on test dataset
train_probabilities: Union[TTextProba, None] , default: None: probabilities on train dataset
test_probabilities: Union[TTextProba, None] , default: None: probabilities on test_dataset dataset
model_classes: Optional[List], default: None: For classification: list of classes known to the model
random_stateint, default 42: A seed to set for pseudo-random functions, primarily sampling.

Returns

SuiteResult: All results by all initialized checks

Notes

The accepted formats for providing model predictions and probabilities are detailed below

Text Classification

Single Class Predictions

predictions - A sequence of class names or indices with one entry per sample, matching the set of classes present in the labels.
probabilities - A sequence of sequences with each element containing the vector of class probabilities for each sample. Each such vector should have one probability per class according to the class (sorted) order, and the probabilities should sum to 1 for each sample.

Multilabel Predictions

predictions - A sequence of sequences with each element containing a binary vector denoting the presence of the i-th class for the given sample. Each such vector should have one binary indicator per class according to the class (sorted) order. More than one class can be present for each sample.
probabilities - A sequence of sequences with each element containing the vector of class probabilities for each sample. Each such vector should have one probability per class according to the class (sorted) order, and the probabilities should range from 0 to 1 for each sample, but are not required to sum to 1.

Token Classification

predictions - A sequence of sequences, with the inner sequence containing tuples in the following format: (class_name, span_start, span_end, class_probability). span_start and span_end are the start and end character indices of the token within the text, as it was passed to the raw_text argument. Each upper level sequence contains a sequence of tokens for each sample.
probabilities - No probabilities should be passed for Token Classification tasks. Passing probabilities will result in an error.

Examples

Text Classification

Single Class Predictions

>>> predictions = ['class_1', 'class_1', 'class_2']
>>> probabilities = [[0.2, 0.8], [0.5, 0.5], [0.3, 0.7]]

Multilabel Predictions

>>> predictions = [[0, 0, 1], [0, 1, 1]]
>>> probabilities = [[0.2, 0.3, 0.8], [0.4, 0.9, 0.6]]

Token Classification

>>> predictions = [[('class_1', 0, 2, 0.8), ('class_2', 7, 10, 0.9)], [('class_2', 42, 54, 0.4)], []]

train_test_validation

full_suite