Note
Click here to download the full example code
Segment Performance#
Load data#
The dataset is the adult dataset which can be downloaded from the UCI machine learning repository.
Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
from deepchecks.tabular.datasets.classification import adult
Create Dataset#
train_ds, validation_ds = adult.load_data()
Classification Model#
model = adult.load_fitted_model()
model
Out:
Pipeline(steps=[('preprocessing',
ColumnTransformer(transformers=[('num', SimpleImputer(),
['education-num',
'capital-gain',
'capital-loss',
'hours-per-week', 'age',
'fnlwgt']),
('cat',
Pipeline(steps=[('imputer',
SimpleImputer(strategy='most_frequent')),
('encoder',
OrdinalEncoder())]),
['workclass', 'education',
'marital-status',
'occupation', 'relationship',
'race', 'sex',
'native-country'])])),
('model',
RandomForestClassifier(max_depth=5, n_jobs=-1,
random_state=0))])
from deepchecks.tabular.checks.performance import SegmentPerformance
SegmentPerformance(feature_1='workclass', feature_2='hours-per-week').run(validation_ds, model)
Out:
/home/runner/work/deepchecks/deepchecks/deepchecks/utils/features.py:180: UserWarning:
Cannot use model's built-in feature importance on a Scikit-learn Pipeline, using permutation feature importance calculation instead
Calculating permutation feature importance. Expected to finish in 44 seconds
Total running time of the script: ( 0 minutes 37.398 seconds)