ModelErrorAnalysis#

class ModelErrorAnalysis[source]#

Find features that best split the data into segments of high and low model error.

Deprecated since version 0.8.1: The ModelErrorAnalysis check is deprecated and will be removed in the 0.11 version. Please use the WeakSegmentsPerformance check instead.

The check trains a regression model to predict the error of the user’s model. Then, the features scoring the highest feature importance for the error regression model are selected and the distribution of the error vs the feature values is plotted. The check results are shown only if the error regression model manages to predict the error well enough.

Parameters
max_features_to_showint , default: 3

maximal number of features to show error distribution for.

min_feature_contributionfloat , default: 0.15

minimum feature importance of a feature to the error regression model in order to show the feature.

min_error_model_scorefloat , default: 0.5

minimum r^2 score of the error regression model for displaying the check.

min_segment_sizefloat , default: 0.05

minimal fraction of data that can comprise a weak segment.

alternative_scorerTuple[str, Callable] , default None

An optional dictionary of scorer name to scorer function. Only a single entry is allowed in this check. If none given, using default scorer

n_samplesint , default: 50_000

number of samples to use for this check.

n_display_samplesint , default: 5_000

number of samples to display in scatter plot.

random_stateint, default: 42

random seed for all check internals.

Notes

Scorers are a convention of sklearn to evaluate a model. See scorers documentation A scorer is a function which accepts (model, X, y_true) and returns a float result which is the score. For every scorer higher scores are better than lower scores.

You can create a scorer out of existing sklearn metrics:

from sklearn.metrics import roc_auc_score, make_scorer

training_labels = [1, 2, 3]
auc_scorer = make_scorer(roc_auc_score, labels=training_labels, multi_class='ovr')
# Note that the labels parameter is required for multi-class classification in metrics like roc_auc_score or
# log_loss that use the predict_proba function of the model, in case that not all labels are present in the test
# set.

Or you can implement your own:

from sklearn.metrics import make_scorer

def my_mse(y_true, y_pred):
    return (y_true - y_pred) ** 2

# Mark greater_is_better=False, since scorers always suppose to return
# value to maximize.
my_mse_scorer = make_scorer(my_mse, greater_is_better=False)
__init__(max_features_to_show: int = 3, min_feature_contribution: float = 0.15, min_error_model_score: float = 0.5, min_segment_size: float = 0.05, alternative_scorer: Optional[Tuple[str, Union[str, Callable]]] = None, n_samples: int = 50000, n_display_samples: int = 5000, random_state: int = 42, **kwargs)[source]#
__new__(*args, **kwargs)#

Methods

ModelErrorAnalysis.add_condition(name, ...)

Add new condition function to the check.

ModelErrorAnalysis.add_condition_segments_performance_relative_difference_less_than([...])

Add condition - require that the difference of performance between the segments is less than threshold.

ModelErrorAnalysis.clean_conditions()

Remove all conditions from this check instance.

ModelErrorAnalysis.conditions_decision(result)

Run conditions on given result.

ModelErrorAnalysis.config()

Return check configuration (conditions' configuration not yet supported).

ModelErrorAnalysis.from_config(conf)

Return check object from a CheckConfig object.

ModelErrorAnalysis.metadata([with_doc_link])

Return check metadata.

ModelErrorAnalysis.name()

Name of class in split camel case.

ModelErrorAnalysis.params([show_defaults])

Return parameters to show when printing the check.

ModelErrorAnalysis.remove_condition(index)

Remove given condition by index.

ModelErrorAnalysis.run(train_dataset, ...[, ...])

Run check.

ModelErrorAnalysis.run_logic(context)

Run check.