.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "tabular/auto_checks/model_evaluation/plot_regression_error_distribution.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_tabular_auto_checks_model_evaluation_plot_regression_error_distribution.py: .. _tabular__regression_error_distribution: Regression Error Distribution ***************************** This notebook provides an overview for using and understanding the Regression Error Distribution check. **Structure:** * `What is the Regression Error Distribution check? <#what-is-the-regression-error-distribution-check>`__ * `Run the check <#run-the-check>`__ * `Define a condition <#define-a-condition>`__ What is the Regression Error Distribution check? ================================================== The ``RegressionErrorDistribution`` check shows the distribution of the regression error, and enables to set conditions on two of the distribution parameters: Systematic error and Kurtosis values. Kurtosis is a measure of the shape of the distribution, helping us understand if the distribution is significantly "wider" from the normal distribution, which may imply a certain cause of error deforming the normal shape. Systematic error, otherwise known as the error bias, is the mean prediction error of the model. .. GENERATED FROM PYTHON SOURCE LINES 26-28 Run the check ============= .. GENERATED FROM PYTHON SOURCE LINES 30-32 Generate data & model ---------------------- .. GENERATED FROM PYTHON SOURCE LINES 32-42 .. code-block:: default from sklearn.datasets import load_diabetes from sklearn.ensemble import GradientBoostingRegressor from sklearn.model_selection import train_test_split diabetes_df = load_diabetes(return_X_y=False, as_frame=True).frame train_df, test_df = train_test_split(diabetes_df, test_size=0.33, random_state=42) clf = GradientBoostingRegressor(random_state=0) clf.fit(train_df.drop('target', axis=1), train_df['target']) .. rst-class:: sphx-glr-script-out .. code-block:: none GradientBoostingRegressor(random_state=0) .. GENERATED FROM PYTHON SOURCE LINES 43-47 Run the check (normal distribution) --------------------------------------- Since the following distribution resembles the normal distribution, both the kurtosis value and the systematic error will be ~0. .. GENERATED FROM PYTHON SOURCE LINES 47-55 .. code-block:: default from deepchecks.tabular import Dataset from deepchecks.tabular.checks import RegressionErrorDistribution test = Dataset(test_df, label='target', cat_features=['sex']) check = RegressionErrorDistribution() check.run(test, clf) .. raw:: html
Regression Error Distribution


.. GENERATED FROM PYTHON SOURCE LINES 56-58 Skewing the data & rerun the check ---------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 58-62 .. code-block:: default test.data[test.label_name] = 150 check.run(test, clf) .. raw:: html
Regression Error Distribution


.. GENERATED FROM PYTHON SOURCE LINES 63-68 Define a condition ================== After artificially skewing the target variable, both the kurtosis value and the systematic error would be significantly larger. In the conditions below we check if the systemic error, otherwise the mean prediction error, is less than 0.01 times the model's rmse score and that the kurtosis is greater than -0.1. .. GENERATED FROM PYTHON SOURCE LINES 68-73 .. code-block:: default check = RegressionErrorDistribution() check.add_condition_kurtosis_greater_than(threshold=-0.1) check.add_condition_systematic_error_ratio_to_rmse_less_than(max_ratio=0.01) check.run(test, clf) .. raw:: html
Regression Error Distribution


.. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.540 seconds) .. _sphx_glr_download_tabular_auto_checks_model_evaluation_plot_regression_error_distribution.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_regression_error_distribution.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_regression_error_distribution.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_