WeakSegmentsPerformance#

class WeakSegmentsPerformance[source]#

Search for segments with low performance scores.

The check is designed to help you easily identify weak spots of your model and provide a deepdive analysis into its performance on different segments of your data. Specifically, it is designed to help you identify the model weakest segments in the data distribution for further improvement and visibility purposes.

In order to achieve this, the check trains several simple tree based models which try to predict the error of the user provided model on the dataset. The relevant segments are detected by analyzing the different leafs of the trained trees.

Parameters
columnsUnion[Hashable, List[Hashable]] , default: None

Columns to check, if none are given checks all columns except ignored ones.

ignore_columnsUnion[Hashable, List[Hashable]] , default: None

Columns to ignore, if none given checks based on columns variable

n_top_featuresOptional[int] , default: 10

Number of features to use for segment search. Top columns are selected based on feature importance.

segment_minimum_size_ratio: float , default: 0.05

Minimum size ratio for segments. Will only search for segments of size >= segment_minimum_size_ratio * data_size.

max_categories_weak_segment: Optional[int] , default: None

Maximum number of categories that can be included in a weak segment per categorical feature. If None, the number of categories is not limited.

alternative_scorerDict[str, Union[str, Callable]] , default: None

Scorer to use as performance measure, either function or sklearn scorer name. If None, a default scorer (per the model type) will be used.

score_per_sample: Union[np.array, pd.Series, None], default: None

Score per sample are required to detect relevant weak segments. Should follow the convention that a sample with a higher score mean better model performance on that sample. If provided, the check will also use provided score per sample as a scoring function for segments. if None the check calculates score per sample by via neg cross entropy for classification and neg MSE for regression.

loss_per_sample: Union[np.array, pd.Series, None], default: None

Deprecated, please use score_per_sample instead.

n_samplesint , default: 10_000

number of samples to use for this check.

n_to_showint , default: 3

number of segments with the weakest performance to show.

categorical_aggregation_thresholdfloat , default: 0.05

In each categorical column, categories with frequency below threshold will be merged into “Other” category.

random_stateint, default: 42

random seed for all check internals.

multiple_segments_per_featurebool , default: True

If True, will allow the same feature to be a segmenting feature in multiple segments, otherwise each feature can appear in one segment at most.

__init__(columns: Optional[Union[Hashable, List[Hashable]]] = None, ignore_columns: Optional[Union[Hashable, List[Hashable]]] = None, n_top_features: Optional[int] = 10, segment_minimum_size_ratio: float = 0.05, max_categories_weak_segment: Optional[int] = None, alternative_scorer: Optional[Dict[str, Union[Callable, str]]] = None, loss_per_sample: Optional[Union[ndarray, Series]] = None, score_per_sample: Optional[Union[ndarray, Series]] = None, n_samples: int = 10000, categorical_aggregation_threshold: float = 0.05, n_to_show: int = 3, random_state: int = 42, multiple_segments_per_feature: bool = True, **kwargs)[source]#
__new__(*args, **kwargs)#

Attributes

WeakSegmentsPerformance.categorical_aggregation_threshold

WeakSegmentsPerformance.max_categories_weak_segment

WeakSegmentsPerformance.min_category_size_ratio

WeakSegmentsPerformance.n_to_show

WeakSegmentsPerformance.n_top_features

WeakSegmentsPerformance.random_state

WeakSegmentsPerformance.segment_minimum_size_ratio

Methods

WeakSegmentsPerformance.add_condition(name, ...)

Add new condition function to the check.

WeakSegmentsPerformance.add_condition_segments_relative_performance_greater_than([...])

Add condition - check that the score of the weakest segment is greater than supplied relative threshold.

WeakSegmentsPerformance.clean_conditions()

Remove all conditions from this check instance.

WeakSegmentsPerformance.conditions_decision(result)

Run conditions on given result.

WeakSegmentsPerformance.config([...])

Return checks instance config.

WeakSegmentsPerformance.from_config(conf[, ...])

Return check object from a CheckConfig object.

WeakSegmentsPerformance.from_json(conf[, ...])

Deserialize check instance from JSON string.

WeakSegmentsPerformance.metadata([with_doc_link])

Return check metadata.

WeakSegmentsPerformance.name()

Name of class in split camel case.

WeakSegmentsPerformance.params([show_defaults])

Return parameters to show when printing the check.

WeakSegmentsPerformance.remove_condition(index)

Remove given condition by index.

WeakSegmentsPerformance.run(dataset[, ...])

Run check.

WeakSegmentsPerformance.run_logic(context, ...)

Run check.

WeakSegmentsPerformance.to_json([indent, ...])

Serialize check instance to JSON string.

Examples#