.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "nlp/auto_checks/model_evaluation/plot_single_dataset_performance.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_nlp_auto_checks_model_evaluation_plot_single_dataset_performance.py: .. _nlp__single_dataset_performance: Single Dataset Performance ***************************** This notebook provides an overview for using and understanding the single dataset performance check for NLP tasks. **Structure:** * `What is the purpose of the check? <#what-is-the-purpose-of-the-check>`__ * `Generate data & model <#generate-data-model>`__ * `Run the check <#run-the-check>`__ * `Define a condition <#define-a-condition>`__ What is the purpose of the check? ================================== This check is designed for evaluating a model's performance on a labeled dataset based on a scorer or multiple scorers. For Text Classification tasks the supported metrics are sklearn scorers. You may use any of the existing sklearn scorers or create your own. For more information about the supported sklearn scorers, defining your own metrics and to learn how to use metrics for other supported task types, see the :ref:`metrics_user_guide`. The default scorers that are used for are F1, Precision, and Recall for Classification, and F1 Macro, Recall Macro and Precision Macro for Token Classification. See more about the supported task types at the :ref:`nlp__supported_tasks` guide. .. GENERATED FROM PYTHON SOURCE LINES 31-33 Generate data & model ====================== .. GENERATED FROM PYTHON SOURCE LINES 33-39 .. code-block:: default from deepchecks.nlp.datasets.classification.tweet_emotion import load_data, load_precalculated_predictions _, test_dataset = load_data(data_format='TextData') _, test_probas = load_precalculated_predictions(pred_format='probabilities') .. GENERATED FROM PYTHON SOURCE LINES 40-45 Run the check ============== You can select which scorers to use by passing either a list or a dict of scorers to the check, see :ref:`metrics_user_guide` for additional details. .. GENERATED FROM PYTHON SOURCE LINES 45-52 .. code-block:: default from deepchecks.nlp.checks import SingleDatasetPerformance check = SingleDatasetPerformance(scorers=['recall_per_class', 'precision_per_class', 'f1_macro', 'f1_micro']) result = check.run(dataset=test_dataset, probabilities=test_probas) result.show() .. raw:: html
Single Dataset Performance


.. GENERATED FROM PYTHON SOURCE LINES 53-59 Define a condition =================== We can define on our check a condition to validate that the different metric scores are above a certain threshold. Using the ``class_mode`` argument we can define select a sub set of the classes to use for the condition. Let's add a condition to the check and see what happens when it fails: .. GENERATED FROM PYTHON SOURCE LINES 59-64 .. code-block:: default check.add_condition_greater_than(threshold=0.85, class_mode='all') result = check.run(dataset=test_dataset, probabilities=test_probas) result.show(show_additional_outputs=False) .. raw:: html
Single Dataset Performance


.. GENERATED FROM PYTHON SOURCE LINES 65-66 We detected that the Recall score is below specified threshold in at least one of the classes. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 1.223 seconds) .. _sphx_glr_download_nlp_auto_checks_model_evaluation_plot_single_dataset_performance.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_single_dataset_performance.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_single_dataset_performance.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_