.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "tabular/auto_checks/model_evaluation/plot_weak_segments_performance.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_tabular_auto_checks_model_evaluation_plot_weak_segments_performance.py: .. _tabular__weak_segments_performance: Weak Segments Performance ************************* This notebook provides an overview for using and understanding the weak segment performance check. **Structure:** * `What is the purpose of the check? <#what-is-the-purpose-of-the-check>`__ * `Automatically detecting weak segments <#automatically-detecting-weak-segments>`__ * `Generate data & model <#generate-data-model>`__ * `Run the check <#run-the-check>`__ * `Define a condition <#define-a-condition>`__ What is the purpose of the check? ================================== The check is designed to help you easily identify the model's weakest segments in the data provided. In addition, it enables to provide a sublist of the Dataset's features, thus limiting the check to search in interesting subspaces. Automatically detecting weak segments ===================================== The check contains several steps: #. We calculate loss for each sample in the dataset using the provided model via either log-loss or MSE according to the task type. #. Select a subset of features for the weak segment search. This is done by selecting the features with the highest feature importance to the model provided (within the features selected for check, if limited). #. We train multiple simple tree based models, each one is trained using exactly two features (out of the ones selected above) to predict the per sample error calculated before. #. We extract the corresponding data samples for each of the leaves in each of the trees (data segments) and calculate the model performance on them. For the weakest data segments detected we also calculate the model's performance on data segments surrounding them. .. GENERATED FROM PYTHON SOURCE LINES 45-47 Generate data & model ===================== .. GENERATED FROM PYTHON SOURCE LINES 47-54 .. code-block:: default from deepchecks.tabular.datasets.classification.lending_club import ( load_data, load_fitted_model) _, test_ds = load_data() model = load_fitted_model() .. GENERATED FROM PYTHON SOURCE LINES 55-82 Run the check ============= The check has several key parameters (that are all optional) that affect the behavior of the check and especially its output. ``columns / ignore_columns``: Controls which columns should be searched for weak segments. By default, a heuristic is used to determine which columns to use based solely on their feature importance. ``alternative_scorer``: Determines the metric to be used as the performance measurement of the model on different segments. It is important to select a metric that is relevant to the data domain and task you are performing. By default, the check uses Neg RMSE for regression tasks and Accuracy for classification tasks. For additional information on scorers and how to use them see :ref:`Metrics Guide `. ``segment_minimum_size_ratio``: Determines the minimum size of segments that are of interest. The check will return data segments that contain at least this fraction of the total data samples. It is recommended to try different configurations of this parameter as larger segments can be of interest even the model performance on them is superior. ``categorical_aggregation_threshold``: By default the check will combine rare categories into a single category called "Other". This parameter determines the frequency threshold for categories to be mapped into to the "other" category. ``multiple_segments_per_column``: If True, will allow the same feature to be a segmenting feature in multiple segments, otherwise each feature can appear in one segment at most. True by default. see :class:`API reference ` for more details. .. GENERATED FROM PYTHON SOURCE LINES 82-91 .. code-block:: default from deepchecks.tabular.checks import WeakSegmentsPerformance from sklearn.metrics import make_scorer, f1_score scorer = {'f1': make_scorer(f1_score, average='micro')} check = WeakSegmentsPerformance() result = check.run(test_ds, model) result.show() .. rst-class:: sphx-glr-script-out .. code-block:: none /home/runner/work/deepchecks/deepchecks/venv/lib/python3.9/site-packages/joblib/externals/loky/process_executor.py:752: UserWarning: A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak. .. raw:: html
Weak Segments Performance


.. GENERATED FROM PYTHON SOURCE LINES 92-100 Observe the check's output -------------------------- We see in the results that the check indeed found several segments on which the model performance is below average. In the heatmap display we can see model performance on the weakest segments and their environment with respect to the two features that are relevant to the segment. We can switch between several such segments using the tabs on the top. In order to get the full list of weak segments found we will inspect the ``result.value`` attribute. Shown below are the 3 segments with the worst performance. .. GENERATED FROM PYTHON SOURCE LINES 100-104 .. code-block:: default result.value['weak_segments_list'].head(3) .. raw:: html
Accuracy Score Feature1 Feature1 Range Feature2 Feature2 Range % of Data Samples in Segment
0 0.629041 int_rate (16.299999237060547, inf) None 32.73 [84, 4607, 2575, 497, 315, 877, 2929, 1912, 29...
9 0.656739 mort_acc (-inf, 0.5) loan_amnt (9625.0, inf) 24.93 [3153, 4607, 3408, 33, 2929, 1912, 5084, 4457,...
10 0.661074 mort_acc (-inf, 1.5) installment (325.3249969482422, inf) 33.19 [3153, 4607, 3408, 497, 33, 2929, 1912, 1586, ...


.. GENERATED FROM PYTHON SOURCE LINES 105-111 Define a condition ================== We can add a condition that will validate the model's performance on the weakest segment detected is above a certain threshold. A scenario where this can be useful is when we want to make sure that the model is not under performing on a subset of the data that is of interest to us, for example specific age or gender groups. .. GENERATED FROM PYTHON SOURCE LINES 111-118 .. code-block:: default # Let's add a condition and re-run the check: check = WeakSegmentsPerformance(alternative_scorer=scorer, segment_minimum_size_ratio=0.03) check.add_condition_segments_relative_performance_greater_than(0.1) result = check.run(test_ds, model) result.show(show_additional_outputs=False) .. raw:: html
Weak Segments Performance


.. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 58.495 seconds) .. _sphx_glr_download_tabular_auto_checks_model_evaluation_plot_weak_segments_performance.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_weak_segments_performance.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_weak_segments_performance.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_