Single Dataset Performance#

This notebook provides an overview for using and understanding the single dataset performance check.

Structure:

What is the purpose of the check?
Generate data & model
Run the check
Define a condition
Using a custom scorer

What is the purpose of the check?#

This check is designed for evaluating a model’s performance on a labeled dataset based on a scorer or multiple scorers.

Scorers are a convention of sklearn to evaluate a model, it is a function which accepts (model, X, y_true) and returns a float result which is the score. A sklearn convention is that higher scores are better than lower scores. For additional details see scorers documentation.

The default scorers that are used are F1, Precision, and Recall for Classification and Negative Root Mean Square Error, Negative Mean Absolute Error, and R2 for Regression.

Generate data & model#

from deepchecks.tabular.datasets.classification.iris import load_data, load_fitted_model

_, test_dataset = load_data()
model = load_fitted_model()

Run the check#

You can select which scorers to use by passing either a list or a dict of scorers to the check, see Metrics Guide for additional details.

from deepchecks.tabular.checks import SingleDatasetPerformance

check = SingleDatasetPerformance(scorers=['recall_per_class', 'precision_per_class', 'f1_macro', 'f1_micro'])
result = check.run(test_dataset, model)
result.show()

Single Dataset Performance

	Class	Metric	Value	Number of samples
0	0	recall	1.00	13.00
1	1	recall	1.00	13.00
2	2	recall	0.83	12.00
3	0	precision	1.00	13.00
4	1	precision	0.87	13.00
5	2	precision	1.00	12.00
6		f1_macro	0.95	nan
7		f1_micro	0.95	nan

Define a condition#

We can define on our check a condition to validate that the different metric scores are above a certain threshold. Using the class_mode argument we can define select a sub set of the classes to use for the condition.

Let’s add a condition to the check and see what happens when it fails:

check.add_condition_greater_than(threshold=0.85, class_mode='all')
result = check.run(test_dataset, model)
result.show(show_additional_outputs=False)

Single Dataset Performance

Conditions Summary

Status	Condition	More Info
✖	Selected metrics scores are greater than 0.85	Failed for metrics: ['recall']

We detected that the Recall score is below specified threshold in at least one of the classes.

Using a custom scorer#

In addition to the built-in scorers, we can define our own scorer based on sklearn api and run it using the check alongside other scorers:

from sklearn.metrics import fbeta_score, make_scorer

fbeta_scorer = make_scorer(fbeta_score, labels=[0, 1, 2], average=None, beta=0.2)

check = SingleDatasetPerformance(scorers={'my scorer': fbeta_scorer, 'recall': 'recall_per_class'})
result = check.run(test_dataset, model)
result.show()

Single Dataset Performance

	Class	Metric	Value	Number of samples
0	0	my scorer	1.00	13
1	1	my scorer	0.87	13
2	2	my scorer	0.99	12
3	0	recall	1.00	13
4	1	recall	1.00	13
5	2	recall	0.83	12

Total running time of the script: (0 minutes 1.995 seconds)

Gallery generated by Sphinx-Gallery

Regression Systematic Error

Train Test Performance

Single Dataset Performance#

What is the purpose of the check?#

Generate data & model#

Run the check#

Single Dataset Performance

Additional Outputs

Define a condition#

Single Dataset Performance

Conditions Summary

Using a custom scorer#

Single Dataset Performance

Additional Outputs