.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "tabular/auto_tutorials/quickstarts/plot_quick_model_evaluation.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_tabular_auto_tutorials_quickstarts_plot_quick_model_evaluation.py: .. _quick_model_evaluation: Model Evaluation Suite Quickstart *********************************** The deepchecks model evaluation suite is relevant any time you wish to evaluate your model. For example: - Thorough analysis of the model's performance before deploying it. - Evaluation of a proposed model during the model selection and optimization stage. - Checking the model's performance on a new batch of data (with or without comparison to previous data batches). Here we'll build a regression model using the wine quality dataset (:mod:`deepchecks.tabular.datasets.regression.wine_quality`), to demonstrate how you can run the suite with only a few simple lines of code, and see which kind of insights it can find. .. code-block:: bash # Before we start, if you don't have deepchecks installed yet, run: import sys !{sys.executable} -m pip install deepchecks -U --quiet # or install using pip from your python environment .. GENERATED FROM PYTHON SOURCE LINES 30-35 Prepare Data and Model ====================== Load Data ----------- .. GENERATED FROM PYTHON SOURCE LINES 35-41 .. code-block:: default from deepchecks.tabular.datasets.regression import wine_quality data = wine_quality.load_data(data_format='Dataframe', as_train_test=False) data.head(2) .. raw:: html
fixed acidity volatile acidity citric acid residual sugar chlorides free sulfur dioxide total sulfur dioxide density pH sulphates alcohol quality
0 7.4 0.70 0.0 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 5
1 7.8 0.88 0.0 2.6 0.098 25.0 67.0 0.9968 3.20 0.68 9.8 5


.. GENERATED FROM PYTHON SOURCE LINES 42-45 Split Data and Train a Simple Model ----------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 45-52 .. code-block:: default from sklearn.model_selection import train_test_split from sklearn.ensemble import GradientBoostingRegressor X_train, X_test, y_train, y_test = train_test_split(data.iloc[:, :-1], data['quality'], test_size=0.2, random_state=42) gbr = GradientBoostingRegressor() gbr.fit(X_train, y_train) .. rst-class:: sphx-glr-script-out .. code-block:: none GradientBoostingRegressor() .. GENERATED FROM PYTHON SOURCE LINES 53-62 Run Deepchecks for Model Evaluation =========================================== Create a Dataset Object ------------------------- Create a deepchecks Dataset, including the relevant metadata (label, date, index, etc.). Check out :class:`deepchecks.tabular.Dataset` to see all the column types and attributes that can be declared. .. GENERATED FROM PYTHON SOURCE LINES 62-73 .. code-block:: default from deepchecks.tabular import Dataset # Categorical features can be heuristically inferred, however we # recommend to state them explicitly to avoid misclassification. # Metadata attributes are optional. Some checks will run only if specific attributes are declared. train_ds = Dataset(X_train, label=y_train, cat_features=[]) test_ds = Dataset(X_test, label=y_test, cat_features=[]) .. GENERATED FROM PYTHON SOURCE LINES 74-84 Run the Deepchecks Suite -------------------------- Validate your data with the :class:`deepchecks.tabular.suites.model_evaluation` suite. It runs on two datasets and a model, so you can use it to compare the performance of the model between any two batches of data (e.g. train data, test data, a new batch of data that recently arrived) Check out the :ref:`when you should use ` for some more info about the existing suites and when to use them. .. GENERATED FROM PYTHON SOURCE LINES 84-93 .. code-block:: default from deepchecks.tabular.suites import model_evaluation evaluation_suite = model_evaluation() suite_result = evaluation_suite.run(train_ds, test_ds, gbr) # Note: the result can be saved as html using suite_result.save_as_html() # or exported to json using suite_result.to_json() suite_result.show() .. rst-class:: sphx-glr-script-out .. code-block:: none Model Evaluation Suite: | | 0/11 [Time: 00:00] Model Evaluation Suite: |# | 1/11 [Time: 00:00, Check=Train Test Performance] Model Evaluation Suite: |##### | 5/11 [Time: 00:00, Check=Simple Model Comparison] Model Evaluation Suite: |####### | 7/11 [Time: 00:06, Check=Calibration Score] Model Evaluation Suite: |######## | 8/11 [Time: 00:06, Check=Regression Error Distribution] Model Evaluation Suite: |########## | 10/11 [Time: 00:07, Check=Boosting Overfit] .. raw:: html
Model Evaluation Suite


.. GENERATED FROM PYTHON SOURCE LINES 94-121 Analyzing the results -------------------------- The result showcase a number of interesting insights, first let's inspect the "Didn't Pass" section. * :ref:`tabular__train_test_performance` check result implies that the model overfitted the training data. * :ref:`tabular__regression_systematic_error` (test set) check result demonstrate the model small positive bias. * :ref:`tabular__weak_segments_performance` (test set) check result visualize some specific sub-spaces on which the model performs poorly. Examples for those sub-spaces are wines with low total sulfur dioxide and wines with high alcohol percentage. Next, let's examine the "Passed" section. * :ref:`tabular__simple_model_comparison` check result states that the model performs better than naive baseline models, an opposite result could indicate a problem with the model or the data it was trained on. * :ref:`tabular__boosting_overfit` check and the :ref:`tabular__unused_features` check results implies that the model has a well calibrating boosting stopping rule and that it make good use on the different data features. Let's try and fix the overfitting issue found in the model. Fix the Model and Re-run a Single Check ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. GENERATED FROM PYTHON SOURCE LINES 121-131 .. code-block:: default from deepchecks.tabular.checks import TrainTestPerformance gbr = GradientBoostingRegressor(n_estimators=20) gbr.fit(X_train, y_train) # Initialize the check and add an optional condition check = TrainTestPerformance().add_condition_train_test_relative_degradation_less_than(0.3) result = check.run(train_ds, test_ds, gbr) result.show() .. raw:: html
Train Test Performance


.. GENERATED FROM PYTHON SOURCE LINES 132-143 We mitigated the overfitting to some extent. Additional model tuning is required to overcome other issues discussed above. For now, we will update and remove the relevant conditions from the suite. Updating an Existing Suite -------------------------- To create our own suite, we can start with an empty suite and add checks and condition to it (see :ref:`create_custom_suite`), or we can start with one of the default suites and update it as demonstrated in this section. let's inspect our model evaluation suite's structure .. GENERATED FROM PYTHON SOURCE LINES 144-146 .. code-block:: default evaluation_suite .. rst-class:: sphx-glr-script-out .. code-block:: none Model Evaluation Suite: [ 0: TrainTestPerformance Conditions: 0: Train-Test scores relative degradation is less than 0.1 1: RocReport Conditions: 0: AUC score for all the classes is greater than 0.7 2: ConfusionMatrixReport 3: PredictionDrift Conditions: 0: Prediction drift score < 0.15 4: SimpleModelComparison Conditions: 0: Model performance gain over simple model is greater than 10% 5: WeakSegmentsPerformance(n_to_show=5) Conditions: 0: The relative performance of weakest segment is greater than 80% of average model performance. 6: CalibrationScore 7: RegressionErrorDistribution Conditions: 0: Kurtosis value higher than -0.1 1: Systematic error ratio lower than 0.01 8: UnusedFeatures Conditions: 0: Number of high variance unused features is less or equal to 5 9: BoostingOverfit Conditions: 0: Test score over iterations is less than 5% from the best score 10: ModelInferenceTime Conditions: 0: Average model inference time for one sample is less than 0.001 ] .. GENERATED FROM PYTHON SOURCE LINES 147-148 Next, we will update the Train Test Performance condition and remove the Regression Systematic Error check: .. GENERATED FROM PYTHON SOURCE LINES 149-154 .. code-block:: default evaluation_suite[0].clean_conditions() evaluation_suite[0].add_condition_train_test_relative_degradation_less_than(0.3) evaluation_suite = evaluation_suite.remove(7) .. GENERATED FROM PYTHON SOURCE LINES 155-156 Re-run the suite using: .. GENERATED FROM PYTHON SOURCE LINES 157-161 .. code-block:: default result = evaluation_suite.run(train_ds, test_ds, gbr) result.passed(fail_if_warning=False) .. rst-class:: sphx-glr-script-out .. code-block:: none Model Evaluation Suite: | | 0/10 [Time: 00:00] Model Evaluation Suite: |# | 1/10 [Time: 00:00, Check=Train Test Performance] Model Evaluation Suite: |##### | 5/10 [Time: 00:00, Check=Simple Model Comparison] Model Evaluation Suite: |####### | 7/10 [Time: 00:06, Check=Calibration Score] Model Evaluation Suite: |######### | 9/10 [Time: 00:07, Check=Boosting Overfit] True .. GENERATED FROM PYTHON SOURCE LINES 162-164 For more info about working with conditions, see the detailed :ref:`configure_check_conditions` guide. .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 17.761 seconds) .. _sphx_glr_download_tabular_auto_tutorials_quickstarts_plot_quick_model_evaluation.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_quick_model_evaluation.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_quick_model_evaluation.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_