Regression Error Distribution#

Imports#

from sklearn.datasets import load_diabetes
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import train_test_split

from deepchecks.tabular import Dataset
from deepchecks.tabular.checks.performance import RegressionErrorDistribution

Generating data#

diabetes_df = load_diabetes(return_X_y=False, as_frame=True).frame
train_df, test_df = train_test_split(diabetes_df, test_size=0.33, random_state=42)

train = Dataset(train_df, label='target', cat_features=['sex'])
test = Dataset(test_df, label='target', cat_features=['sex'])

clf = GradientBoostingRegressor(random_state=0)
_ = clf.fit(train.data[train.features], train.data[train.label_name])

Running RegressionErrorDistribution check (normal distribution)#

check = RegressionErrorDistribution()
check.run(test, clf)

Regression Error Distribution

Check regression error distribution.

Additional Outputs
Largest over estimation errors:
  age sex bmi bp s1 s2 s3 s4 s5 s6 target predicted target target Prediction Difference
364 0.00 0.05 -0.01 -0.02 -0.01 0.00 -0.04 0.03 0.01 0.10 262.00 120.59 141.41
9 -0.07 -0.04 0.04 -0.03 -0.01 -0.03 -0.02 -0.00 0.07 -0.01 310.00 183.63 126.37
77 -0.10 -0.04 -0.04 -0.07 -0.04 -0.03 0.02 -0.04 -0.07 -0.00 200.00 85.48 114.52
Largest under estimation errors:
  age sex bmi bp s1 s2 s3 s4 s5 s6 target predicted target target Prediction Difference
380 0.02 -0.04 0.03 0.06 -0.06 -0.04 -0.01 -0.03 -0.05 -0.03 52.00 223.72 -171.72
56 -0.04 -0.04 0.04 -0.03 -0.03 -0.03 -0.04 0.00 0.03 -0.02 52.00 199.97 -147.97
7 0.06 0.05 -0.00 0.07 0.09 0.11 0.02 0.02 -0.04 0.00 63.00 183.45 -120.45


Skewing the data#

test.data[test.label_name] = 150

Running RegressionErrorDistribution check (abnormal distribution)#

check = RegressionErrorDistribution()
check.run(test, clf)

Regression Error Distribution

Check regression error distribution.

Additional Outputs
Largest over estimation errors:
  age sex bmi bp s1 s2 s3 s4 s5 s6 target predicted target target Prediction Difference
237 0.06 -0.04 -0.07 -0.07 -0.00 -0.00 0.04 -0.04 -0.05 -0.00 150 59.07 90.93
436 -0.06 -0.04 -0.07 -0.05 -0.02 -0.05 0.09 -0.08 -0.06 -0.05 150 61.05 88.95
55 -0.04 -0.04 -0.05 -0.04 -0.01 -0.02 0.09 -0.04 -0.07 0.01 150 61.54 88.46
Largest under estimation errors:
  age sex bmi bp s1 s2 s3 s4 s5 s6 target predicted target target Prediction Difference
114 0.02 -0.04 0.11 0.06 0.01 -0.03 -0.02 0.02 0.10 0.02 150 302.13 -152.13
332 0.03 -0.04 0.10 0.08 -0.01 -0.01 -0.06 0.03 0.06 0.04 150 295.71 -145.71
321 0.10 -0.04 0.05 0.08 0.05 0.04 -0.08 0.14 0.10 0.06 150 269.18 -119.18


Total running time of the script: ( 0 minutes 0.193 seconds)

Gallery generated by Sphinx-Gallery