SimpleModelComparison#

class SimpleModelComparison[source]#

Compare given model score to simple model score (according to given model type).

Parameters
strategystr, default: ‘most_frequent’

Strategy to use to generate the predictions of the simple model [‘stratified’, ‘uniform’, ‘most_frequent’, ‘tree’].

  • stratified: randomly draw a label based on the train set label distribution. (Previously ‘random’)

  • uniform: in regression samples predictions uniformly at random from the y ranges. in classification draws

    predictions uniformly at random from the list of values in y.

  • most_frequent: in regression is mean value, in classification the most common value. (Previously ‘constant’)

  • tree: runs a simple decision tree.

alternative_scorersDict[str, Callable], default: None

An optional dictionary of scorer title to scorer functions/names. If none given, using default scorers. For description about scorers see Notes below.

max_gainfloat , default: 50

the maximum value for the gain value, limits from both sides [-max_gain, max_gain]

max_depthint , default: 3

the max depth of the tree (used only if simple model type is tree).

n_samplesint , default: 1_000_000

number of samples to use for this check.

random_stateint , default: 42

the random state (used only if simple model type is tree or random).

Notes

Scorers are a convention of sklearn to evaluate a model. See scorers documentation A scorer is a function which accepts (model, X, y_true) and returns a float result which is the score. For every scorer higher scores are better than lower scores.

You can create a scorer out of existing sklearn metrics:

from sklearn.metrics import roc_auc_score, make_scorer

training_labels = [1, 2, 3]
auc_scorer = make_scorer(roc_auc_score, labels=training_labels, multi_class='ovr')
# Note that the labels parameter is required for multi-class classification in metrics like roc_auc_score or
# log_loss that use the predict_proba function of the model, in case that not all labels are present in the test
# set.

Or you can implement your own:

from sklearn.metrics import make_scorer

def my_mse(y_true, y_pred):
    return (y_true - y_pred) ** 2

# Mark greater_is_better=False, since scorers always suppose to return
# value to maximize.
my_mse_scorer = make_scorer(my_mse, greater_is_better=False)
__init__(strategy: str = 'most_frequent', alternative_scorers: Optional[Dict[str, Callable]] = None, max_gain: float = 50, max_depth: int = 3, n_samples: int = 1000000, random_state: int = 42, **kwargs)[source]#
__new__(*args, **kwargs)#

Methods

SimpleModelComparison.add_condition(name, ...)

Add new condition function to the check.

SimpleModelComparison.add_condition_gain_greater_than([...])

Add condition - require minimum allowed gain between the model and the simple model.

SimpleModelComparison.clean_conditions()

Remove all conditions from this check instance.

SimpleModelComparison.conditions_decision(result)

Run conditions on given result.

SimpleModelComparison.config([include_version])

Return check instance config.

SimpleModelComparison.from_config(conf[, ...])

Return check object from a CheckConfig object.

SimpleModelComparison.from_json(conf[, ...])

Deserialize check instance from JSON string.

SimpleModelComparison.metadata([with_doc_link])

Return check metadata.

SimpleModelComparison.name()

Name of class in split camel case.

SimpleModelComparison.params([show_defaults])

Return parameters to show when printing the check.

SimpleModelComparison.remove_condition(index)

Remove given condition by index.

SimpleModelComparison.run(train_dataset, ...)

Run check.

SimpleModelComparison.run_logic(context)

Run check.

SimpleModelComparison.to_json([indent])

Serialize check instance to JSON string.

Examples#