class SimpleModelComparison[source]#

Compare given model score to simple model score (according to given model type).

strategystr, default: ‘most_frequent’

Strategy to use to generate the predictions of the simple model [‘stratified’, ‘uniform’, ‘most_frequent’, ‘tree’].

  • stratified: randomly draw a label based on the train set label distribution. (Previously ‘random’)

  • uniform: in regression samples predictions uniformly at random from the y ranges. in classification draws

    predictions uniformly at random from the list of values in y.

  • most_frequent: in regression is mean value, in classification the most common value. (Previously ‘constant’)

  • tree: runs a simple decision tree.

scorers: Union[Mapping[str, Union[str, Callable]], List[str]], default: None

Scorers to override the default scorers, find more about the supported formats at

alternative_scorersDict[str, Callable], default: None

Deprecated, please use scorers instead.

max_gainfloat , default: 50

the maximum value for the gain value, limits from both sides [-max_gain, max_gain]

max_depthint , default: 3

the max depth of the tree (used only if simple model type is tree).

n_samplesint , default: 1_000_000

number of samples to use for this check.

random_stateint , default: 42

the random state (used only if simple model type is tree or random).


Scorers are a convention of sklearn to evaluate a model. See scorers documentation A scorer is a function which accepts (model, X, y_true) and returns a float result which is the score. For every scorer higher scores are better than lower scores.

You can create a scorer out of existing sklearn metrics:

from sklearn.metrics import roc_auc_score, make_scorer

training_labels = [1, 2, 3]
auc_scorer = make_scorer(roc_auc_score, labels=training_labels, multi_class='ovr')
# Note that the labels parameter is required for multi-class classification in metrics like roc_auc_score or
# log_loss that use the predict_proba function of the model, in case that not all labels are present in the test
# set.

Or you can implement your own:

from sklearn.metrics import make_scorer

def my_mse(y_true, y_pred):
    return (y_true - y_pred) ** 2

# Mark greater_is_better=False, since scorers always suppose to return
# value to maximize.
my_mse_scorer = make_scorer(my_mse, greater_is_better=False)
__init__(strategy: str = 'most_frequent', scorers: Optional[Union[Mapping[str, Union[str, Callable]], List[str]]] = None, alternative_scorers: Optional[Dict[str, Callable]] = None, max_gain: float = 50, max_depth: int = 3, n_samples: int = 1000000, random_state: int = 42, **kwargs)[source]#
__new__(*args, **kwargs)#


SimpleModelComparison.add_condition(name, ...)

Add new condition function to the check.


Add condition - require minimum allowed gain between the model and the simple model.


Remove all conditions from this check instance.


Run conditions on given result.


Return check instance config.

SimpleModelComparison.from_config(conf[, ...])

Return check object from a CheckConfig object.

SimpleModelComparison.from_json(conf[, ...])

Deserialize check instance from JSON string.


Return check metadata.

Name of class in split camel case.


Return parameters to show when printing the check.


Remove given condition by index., ...)

Run check.


Run check.

SimpleModelComparison.to_json([indent, ...])

Serialize check instance to JSON string.