Segment Performance#

Load data#

The dataset is the adult dataset which can be downloaded from the UCI machine learning repository.

Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

from deepchecks.tabular.datasets.classification import adult

Create Dataset#

train_ds, validation_ds = adult.load_data()

Classification Model#

model = adult.load_fitted_model()
model

Out:

Pipeline(steps=[('preprocessing',
                 ColumnTransformer(transformers=[('num', SimpleImputer(),
                                                  ['education-num',
                                                   'capital-gain',
                                                   'capital-loss',
                                                   'hours-per-week', 'age',
                                                   'fnlwgt']),
                                                 ('cat',
                                                  Pipeline(steps=[('imputer',
                                                                   SimpleImputer(strategy='most_frequent')),
                                                                  ('encoder',
                                                                   OrdinalEncoder())]),
                                                  ['workclass', 'education',
                                                   'marital-status',
                                                   'occupation', 'relationship',
                                                   'race', 'sex',
                                                   'native-country'])])),
                ('model',
                 RandomForestClassifier(max_depth=5, n_jobs=-1,
                                        random_state=0))])
from deepchecks.tabular.checks.performance import SegmentPerformance

SegmentPerformance(feature_1='workclass', feature_2='hours-per-week').run(validation_ds, model)

Out:

/home/runner/work/deepchecks/deepchecks/deepchecks/utils/features.py:180: UserWarning:

Cannot use model's built-in feature importance on a Scikit-learn Pipeline, using permutation feature importance calculation instead

Calculating permutation feature importance. Expected to finish in 44 seconds

Segment Performance

Display performance score segmented by 2 top (or given) features in a heatmap.

Additional Outputs


Total running time of the script: ( 0 minutes 37.398 seconds)

Gallery generated by Sphinx-Gallery