SimpleModelComparison#
- class SimpleModelComparison[source]#
Compare given model score to simple model score (according to given model type).
- Parameters
- strategystr, default: ‘most_frequent’
Strategy to use to generate the predictions of the simple model [‘stratified’, ‘uniform’, ‘most_frequent’, ‘tree’].
stratified: randomly draw a label based on the train set label distribution. (Previously ‘random’)
- uniform: in regression samples predictions uniformly at random from the y ranges. in classification draws
predictions uniformly at random from the list of values in y.
most_frequent: in regression is mean value, in classification the most common value. (Previously ‘constant’)
tree: runs a simple decision tree.
- simple_model_typestr , default: most_frequent
Deprecated. Please use strategy instead.
- alternative_scorersDict[str, Callable], default: None
An optional dictionary of scorer title to scorer functions/names. If none given, using default scorers. For description about scorers see Notes below.
- max_gainfloat , default: 50
the maximum value for the gain value, limits from both sides [-max_gain, max_gain]
- max_depthint , default: 3
the max depth of the tree (used only if simple model type is tree).
- random_stateint , default: 42
the random state (used only if simple model type is tree or random).
Notes
Scorers are a convention of sklearn to evaluate a model. See scorers documentation A scorer is a function which accepts (model, X, y_true) and returns a float result which is the score. For every scorer higher scores are better than lower scores.
You can create a scorer out of existing sklearn metrics:
from sklearn.metrics import roc_auc_score, make_scorer training_labels = [1, 2, 3] auc_scorer = make_scorer(roc_auc_score, labels=training_labels, multi_class='ovr') # Note that the labels parameter is required for multi-class classification in metrics like roc_auc_score or # log_loss that use the predict_proba function of the model, in case that not all labels are present in the test # set.
Or you can implement your own:
from sklearn.metrics import make_scorer def my_mse(y_true, y_pred): return (y_true - y_pred) ** 2 # Mark greater_is_better=False, since scorers always suppose to return # value to maximize. my_mse_scorer = make_scorer(my_mse, greater_is_better=False)
- __init__(strategy: str = 'most_frequent', simple_model_type: Optional[str] = None, alternative_scorers: Optional[Dict[str, Callable]] = None, max_gain: float = 50, max_depth: int = 3, random_state: int = 42, **kwargs)[source]#
- __new__(*args, **kwargs)#
Methods
|
Add new condition function to the check. |
|
Add condition - require minimum allowed gain between the model and the simple model. |
Remove all conditions from this check instance. |
|
Run conditions on given result. |
|
Return check configuration (conditions' configuration not yet supported). |
|
Return check object from a CheckConfig object. |
|
|
Return check metadata. |
Name of class in split camel case. |
|
|
Return parameters to show when printing the check. |
Remove given condition by index. |
|
|
Run check. |
|
Run check. |