deepchecks.tabular#

Package for tabular functionality.

Modules

`checks`	Module importing all tabular checks.
`suites`	Module contains all prebuilt suites.
`datasets`	Module for working with pre-built datasets.

Classes

class Dataset[source]#

Dataset wraps pandas DataFrame together with ML related metadata.

The Dataset class is containing additional data and methods intended for easily accessing metadata relevant for the training or validating of an ML models.

Parameters

dfAny

An object that can be casted to a pandas DataFrame

containing data relevant for the training or validating of a ML models.

labelt.Union[Hashable, pd.Series, pd.DataFrame, np.ndarray] , default: None

label column provided either as a string with the name of an existing column in the DataFrame or a label object including the label data (pandas Series/DataFrame or a numpy array) that will be concatenated to the data in the DataFrame. in case of label data the following logic is applied to set the label name:

Series: takes the series name or ‘target’ if name is empty
DataFrame: expect single column in the dataframe and use its name
numpy: use ‘target’

featurest.Optional[t.Sequence[Hashable]] , default: None

List of names for the feature columns in the DataFrame.

cat_featurest.Optional[t.Sequence[Hashable]] , default: None

List of names for the categorical features in the DataFrame. In order to disable categorical. features inference, pass cat_features=[]

index_namet.Optional[Hashable] , default: None

Name of the index column in the dataframe. If set_index_from_dataframe_index is True and index_name is not None, index will be created from the dataframe index level with the given name. If index levels have no names, an int must be used to select the appropriate level by order.

set_index_from_dataframe_indexbool , default: False

If set to true, index will be created from the dataframe index instead of dataframe columns (default). If index_name is None, first level of the index will be used in case of a multilevel index.

datetime_namet.Optional[Hashable] , default: None

Name of the datetime column in the dataframe. If set_datetime_from_dataframe_index is True and datetime_name is not None, date will be created from the dataframe index level with the given name. If index levels have no names, an int must be used to select the appropriate level by order.

set_datetime_from_dataframe_indexbool , default: False

If set to true, date will be created from the dataframe index instead of dataframe columns (default). If datetime_name is None, first level of the index will be used in case of a multilevel index.

convert_datetimebool , default: True

If set to true, date will be converted to datetime using pandas.to_datetime.

datetime_argst.Optional[t.Dict] , default: None

pandas.to_datetime args used for conversion of the datetime column. (look at https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html for more documentation)

max_categorical_ratiofloat , default: 0.01

The max ratio of unique values in a column in order for it to be inferred as a categorical feature.

max_categoriesint , default: None

The maximum number of categories in a column in order for it to be inferred as a categorical feature. if None, uses is_categorical default inference mechanism.

label_typestr , default: None

Used to determine the task type. If None, inferred when running a check based on label column and model. Possible values are: ‘multiclass’, ‘binary’ and ‘regression’.

Attributes

cat_features: Return list of categorical feature names.
classes_in_label_col: Return the classes from label column in sorted list.
columns_info: Return the role and logical type of each column.
data: Return the data of dataset.
datetime_col: Return datetime column if exists.
datetime_name: If datetime column exists, return its name.
features: Return list of feature names.
features_columns: Return DataFrame containing only the features defined in the dataset, if features are empty raise error.
index_col: Return index column.
index_name: If index column exists, return its name.
label_col: Return Series of the label defined in the dataset, if label is not defined raise error.
label_name: If label column exists, return its name.
label_type: Return the label type.
n_samples: Return number of samples in dataframe.
numerical_features: Return list of numerical feature names.

Methods

`assert_datetime`()	Check if datetime is defined and if not raise error.
`assert_features`()	Check if features are defined (not empty) and if not raise error.
`assert_index`()	Check if index is defined and if not raise error.
`cast_to_dataset`(obj)	Verify Dataset or transform to Dataset.
`copy`(new_data)	Create a copy of this Dataset with new data.
`datasets_share_categorical_features`(*datasets)	Verify that all provided datasets share same categorical features.
`datasets_share_date`(*datasets)	Verify that all provided datasets share same date column.
`datasets_share_features`(*datasets)	Verify that all provided datasets share same features.
`datasets_share_index`(*datasets)	Verify that all provided datasets share same index column.
`datasets_share_label`(*datasets)	Verify that all provided datasets share same label column.
`from_numpy`(*args[, columns, label_name])	Create Dataset instance from numpy arrays.
`get_datetime_column_from_index`(datetime_name)	Retrieve the datetime info from the index if _set_datetime_from_dataframe_index is True.
`has_label`()	Return True if label column exists.
`is_categorical`(col_name)	Check if a column is considered a category column in the dataset object.
`is_sampled`(n_samples)	Return True if the dataset number of samples will decrease when sampled with n_samples samples.
`len_when_sampled`(n_samples)	Return number of samples in the sampled dataframe this dataset is sampled with n_samples samples.
`sample`(n_samples[, replace, random_state, ...])	Create a copy of the dataset object, with the internal dataframe being a sample of the original dataframe.
`select`([columns, ignore_columns, keep_label])	Filter dataset columns by given params.
`train_test_split`([train_size, test_size, ...])	Split dataset into random train and test datasets.

__init__(df: Any, label: Optional[Union[Hashable, Series, DataFrame, ndarray]] = None, features: Optional[Sequence[Hashable]] = None, cat_features: Optional[Sequence[Hashable]] = None, index_name: Optional[Hashable] = None, set_index_from_dataframe_index: bool = False, datetime_name: Optional[Hashable] = None, set_datetime_from_dataframe_index: bool = False, convert_datetime: bool = True, datetime_args: Optional[Dict] = None, max_categorical_ratio: float = 0.01, max_categories: Optional[int] = None, label_type: Optional[str] = None, dataset_name: Optional[str] = None, label_classes=None)[source]#

assert_datetime()[source]#

Check if datetime is defined and if not raise error.

Raises

DeepchecksNotSupportedError

assert_features()[source]#

Check if features are defined (not empty) and if not raise error.

Raises

DeepchecksNotSupportedError

assert_index()[source]#

Check if index is defined and if not raise error.

Raises

DeepchecksNotSupportedError

classmethod cast_to_dataset(obj: Any) → Dataset[source]#

Verify Dataset or transform to Dataset.

Function verifies that provided value is a non-empty instance of Dataset, otherwise raises an exception, but if the ‘cast’ flag is set to True it will also try to transform provided value to the Dataset instance.

Parameters

obj: value to verify

Raises

DeepchecksValueError: if the provided value is not a Dataset instance; if the provided value cannot be transformed into Dataset instance;

property cat_features: List[Hashable]#

Return list of categorical feature names.

Returns

t.List[Hashable]: List of categorical feature names.

property classes_in_label_col: Tuple[str, ...]#

Return the classes from label column in sorted list. if no label column defined, return empty list.

Returns

t.Tuple[str, …]: Sorted classes

property columns_info: Dict[Hashable, str]#

Return the role and logical type of each column.

Returns

t.Dict[Hashable, str]: Directory of a column and its role

copy(new_data: DataFrame) → TDataset[source]#

Create a copy of this Dataset with new data.

Parameters

new_data (DataFrame): new data from which new dataset will be created

Returns

Dataset: new dataset instance

property data: pandas.core.frame.DataFrame#: Return the data of dataset.

classmethod datasets_share_categorical_features(*datasets: Dataset) → bool[source]#

Verify that all provided datasets share same categorical features.

Parameters

datasetsList[Dataset]: list of datasets to validate

Returns

bool: True if all datasets share same categorical features, otherwise False

Raises

AssertionError: ‘datasets’ parameter is not a list; ‘datasets’ contains less than one dataset;

classmethod datasets_share_date(*datasets: Dataset) → bool[source]#

Verify that all provided datasets share same date column.

Parameters

datasetsList[Dataset]: list of datasets to validate

Returns

bool: True if all datasets share same date column, otherwise False

Raises

AssertionError: ‘datasets’ parameter is not a list; ‘datasets’ contains less than one dataset;

classmethod datasets_share_features(*datasets: Dataset) → bool[source]#

Verify that all provided datasets share same features.

Parameters

datasetsList[Dataset]: list of datasets to validate

Returns

bool: True if all datasets share same features, otherwise False

Raises

AssertionError: ‘datasets’ parameter is not a list; ‘datasets’ contains less than one dataset;

classmethod datasets_share_index(*datasets: Dataset) → bool[source]#

Verify that all provided datasets share same index column.

Parameters

datasetsList[Dataset]: list of datasets to validate

Returns

bool: True if all datasets share same index column, otherwise False

Raises

AssertionError: ‘datasets’ parameter is not a list; ‘datasets’ contains less than one dataset;

classmethod datasets_share_label(*datasets: Dataset) → bool[source]#

Verify that all provided datasets share same label column.

Parameters

datasetsList[Dataset]: list of datasets to validate

Returns

bool: True if all datasets share same categorical features, otherwise False

Raises

AssertionError: ‘datasets’ parameter is not a list; ‘datasets’ contains less than one dataset;

property datetime_col: Optional[pandas.core.series.Series]#

Return datetime column if exists.

Returns

t.Optional[pd.Series]: Series of the datetime column

property datetime_name: Optional[Hashable]#

If datetime column exists, return its name.

Returns

t.Optional[Hashable]: datetime name

property features: List[Hashable]#

Return list of feature names.

Returns

t.List[Hashable]: List of feature names.

property features_columns: pandas.core.frame.DataFrame#

Return DataFrame containing only the features defined in the dataset, if features are empty raise error.

Returns

pd.DataFrame

classmethod from_numpy(*args: ndarray, columns: Optional[Sequence[Hashable]] = None, label_name: Optional[Hashable] = None, **kwargs) → TDataset[source]#

Create Dataset instance from numpy arrays.

Parameters

*args: np.ndarray: Numpy array of data columns, and second optional numpy array of labels.
columnst.Sequence[Hashable] , default: None: names for the columns. If none provided, the names that will be automatically assigned to the columns will be: 1 - n (where n - number of columns)
label_namet.Hashable , default: None: labels column name. If none is provided, the name ‘target’ will be used.
**kwargsDict: additional arguments that will be passed to the main Dataset constructor.
Returns
——-
Dataset: instance of the Dataset
Raises
——
DeepchecksValueError: if receives zero or more than two numpy arrays. if columns (args[0]) is not two dimensional numpy array. if labels (args[1]) is not one dimensional numpy array. if features array or labels array is empty.

Examples

>>> import numpy
>>> from deepchecks.tabular import Dataset

>>> features = numpy.array([[0.25, 0.3, 0.3],
...                        [0.14, 0.75, 0.3],
...                        [0.23, 0.39, 0.1]])
>>> labels = numpy.array([0.1, 0.1, 0.7])
>>> dataset = Dataset.from_numpy(features, labels)

Creating dataset only from features array.

>>> dataset = Dataset.from_numpy(features)

Passing additional arguments to the main Dataset constructor

>>> dataset = Dataset.from_numpy(features, labels, max_categorical_ratio=0.5)

Specifying features and label columns names.

>>> dataset = Dataset.from_numpy(
...     features, labels,
...     columns=['sensor-1', 'sensor-2', 'sensor-3'],
...     label_name='labels'
... )

get_datetime_column_from_index(datetime_name)[source]#: Retrieve the datetime info from the index if _set_datetime_from_dataframe_index is True.

has_label() → bool[source]#

Return True if label column exists.

Returns

bool: True if label column exists.

property index_col: Optional[pandas.core.series.Series]#

Return index column. Index can be a named column or DataFrame index.

Returns

t.Optional[pd.Series]: If index column exists, returns a pandas Series of the index column.

property index_name: Optional[Hashable]#

If index column exists, return its name.

Returns

t.Optional[Hashable]: index name

is_categorical(col_name: Hashable) → bool[source]#

Check if a column is considered a category column in the dataset object.

Parameters

col_nameHashable: The name of the column in the dataframe

Returns

bool: If is categorical according to input numbers

is_sampled(n_samples: int)[source]#: Return True if the dataset number of samples will decrease when sampled with n_samples samples.

property label_col: pandas.core.series.Series#

Return Series of the label defined in the dataset, if label is not defined raise error.

Returns

pd.Series

property label_name: Optional[Hashable]#

If label column exists, return its name. Otherwise, throw an exception.

Returns

t.Optional[Hashable]: Label name

property label_type: Optional[deepchecks.tabular.utils.task_type.TaskType]#

Return the label type.

Returns

t.Optional[TaskType]: Label type

len_when_sampled(n_samples: int)[source]#: Return number of samples in the sampled dataframe this dataset is sampled with n_samples samples.

property n_samples: int#

Return number of samples in dataframe.

Returns

int: Number of samples in dataframe

property numerical_features: List[Hashable]#

Return list of numerical feature names.

Returns

t.List[Hashable]: List of numerical feature names.

sample(n_samples: Optional[int], replace: bool = False, random_state: Optional[int] = None, drop_na_label: bool = False) → TDataset[source]#

Create a copy of the dataset object, with the internal dataframe being a sample of the original dataframe.

Parameters

n_samplest.Optional[int]: Number of samples to draw.
replacebool, default: False: Whether to sample with replacement.
random_statet.Optional[int] , default None: Random state.
drop_na_labelbool, default: False: Whether to take sample only from rows with exiting label.

Returns

Dataset: instance of the Dataset with sampled internal dataframe.

select(columns: Optional[Union[Hashable, List[Hashable]]] = None, ignore_columns: Optional[Union[Hashable, List[Hashable]]] = None, keep_label: bool = False) → TDataset[source]#

Filter dataset columns by given params.

Parameters

columnsUnion[Hashable, List[Hashable], None]: Column names to keep.
ignore_columnsUnion[Hashable, List[Hashable], None]: Column names to drop.

Returns

TDataset: horizontally filtered dataset

Raises

DeepchecksValueError: In case one of columns given don’t exists raise error

train_test_split(train_size: Optional[Union[int, float]] = None, test_size: Union[int, float] = 0.25, random_state: int = 42, shuffle: bool = True, stratify: Union[List, Series, ndarray, bool] = False) → Tuple[TDataset, TDataset][source]#

Split dataset into random train and test datasets.

Parameters

train_sizet.Union[int, float, None] , default: None: If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size.
test_sizet.Union[int, float] , default: 0.25: If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples.
random_stateint , default: 42: The random state to use for shuffling.
shufflebool , default: True: Whether to shuffle the data before splitting.
stratifyt.Union[t.List, pd.Series, np.ndarray, bool] , default: False: If True, data is split in a stratified fashion, using the class labels. If array-like, data is split in a stratified fashion, using this as class labels.
Returns
——-
Dataset: Dataset containing train split data.
Dataset: Dataset containing test split data.

class Context[source]#

Contains all the data + properties the user has passed to a check/suite, and validates it seamlessly.

Parameters

train: Union[Dataset, pd.DataFrame, None] , default: None: Dataset or DataFrame object, representing data an estimator was fitted on
test: Union[Dataset, pd.DataFrame, None] , default: None: Dataset or DataFrame object, representing data an estimator predicts on
model: Optional[BasicModel] , default: None: A scikit-learn-compatible fitted estimator instance
feature_importance: pd.Series , default: None: pass manual features importance
feature_importance_force_permutationbool , default: False: force calculation of permutation features importance
feature_importance_timeoutint , default: 120: timeout in second for the permutation features importance calculation
y_pred_train: Optional[np.ndarray] , default: None: Array of the model prediction over the train dataset.
y_pred_test: Optional[np.ndarray] , default: None: Array of the model prediction over the test dataset.
y_proba_train: Optional[np.ndarray] , default: None: Array of the model prediction probabilities over the train dataset.
y_proba_test: Optional[np.ndarray] , default: None: Array of the model prediction probabilities over the test dataset.
model_classes: Optional[List] , default: None: For classification: list of classes known to the model

Attributes

feature_importance: Return feature importance, or None if not possible.
feature_importance_type: Return feature importance type if feature importance is available, else None.
model: Return & validate model if model exists, otherwise raise error.
model_classes: Return ordered list of possible label classes for classification tasks or None for regression.
model_name: Return model name.
observed_classes: Return the observed classes in both train and test.
task_type: Return task type based on calculated classes argument.
test: Return test if exists, otherwise raise error.
train: Return train if exists, otherwise raise error.
with_display: Return the with_display flag.

Methods

`assert_classification_task`()	Assert the task_type is classification.
`assert_regression_task`()	Assert the task type is regression.
`finalize_check_result`(check_result, check[, ...])	Run final processing on a check result which includes validation, conditions processing and sampling footnote.
`get_data_by_kind`(kind)	Return the relevant Dataset by given kind.
`get_scorers`([scorers, use_avg_defaults])	Return initialized & validated scorers in a given priority.
`get_single_scorer`([scorers, use_avg_defaults])	Return initialized & validated single scorer in a given priority.
`have_test`()	Return whether there is test dataset defined.

__init__(train: Optional[Union[Dataset, DataFrame]] = None, test: Optional[Union[Dataset, DataFrame]] = None, model: Optional[BasicModel] = None, feature_importance: Optional[Series] = None, feature_importance_force_permutation: bool = False, feature_importance_timeout: int = 120, with_display: bool = True, y_pred_train: Optional[ndarray] = None, y_pred_test: Optional[ndarray] = None, y_proba_train: Optional[ndarray] = None, y_proba_test: Optional[ndarray] = None, model_classes: Optional[List] = None)[source]#

assert_classification_task()[source]#: Assert the task_type is classification.

assert_regression_task()[source]#: Assert the task type is regression.

property feature_importance: Optional[pandas.core.series.Series]#: Return feature importance, or None if not possible.

property feature_importance_type: Optional[str]#: Return feature importance type if feature importance is available, else None.

finalize_check_result(check_result, check, kind: Optional[DatasetKind] = None)[source]#: Run final processing on a check result which includes validation, conditions processing and sampling footnote.

get_data_by_kind(kind: DatasetKind) → Dataset[source]#: Return the relevant Dataset by given kind.

get_scorers(scorers: Optional[Union[Mapping[str, Union[str, Callable]], List[str]]] = None, use_avg_defaults=True) → List[DeepcheckScorer][source]#

Return initialized & validated scorers in a given priority.

If receive scorers use them, Else if user defined global scorers use them, Else use default scorers.

Parameters

scorersUnion[List[str], Dict[str, Union[str, Callable]]], default: None: List of scorers to use. If None, use default scorers. Scorers can be supplied as a list of scorer names or as a dictionary of names and functions.
use_avg_defaultsbool, default True: If no scorers were provided, for classification, determines whether to use default scorers that return an averaged metric, or default scorers that return a metric per class.
Returns
——-
List[DeepcheckScorer]: A list of initialized & validated scorers.

get_single_scorer(scorers: Optional[Mapping[str, Union[str, Callable]]] = None, use_avg_defaults=True) → DeepcheckScorer[source]#

Return initialized & validated single scorer in a given priority.

If receive scorers use them, Else if user defined global scorers use them, Else use default scorers. Returns the first scorer from the scorers described above.

Parameters

scorersUnion[List[str], Dict[str, Union[str, Callable]]], default: None: List of scorers to use. If None, use default scorers. Scorers can be supplied as a list of scorer names or as a dictionary of names and functions.
use_avg_defaultsbool, default True: If no scorers were provided, for classification, determines whether to use default scorers that return an averaged metric, or default scorers that return a metric per class.
Returns
——-
List[DeepcheckScorer]: An initialized & validated scorer.

have_test()[source]#: Return whether there is test dataset defined.

property model: BasicModel#: Return & validate model if model exists, otherwise raise error.

property model_classes: List#: Return ordered list of possible label classes for classification tasks or None for regression.

property model_name#: Return model name.

property observed_classes: List#: Return the observed classes in both train and test. None for regression.

property task_type: deepchecks.tabular.utils.task_type.TaskType#: Return task type based on calculated classes argument.

property test: Dataset#: Return test if exists, otherwise raise error.

property train: Dataset#: Return train if exists, otherwise raise error.

property with_display: bool#: Return the with_display flag.

class Suite[source]#

Tabular suite to run checks of types: TrainTestCheck, SingleDatasetCheck, ModelOnlyCheck.

Methods

`add`(check)	Add a check or a suite to current suite.
`config`()	Return suite configuration (checks' conditions' configuration not yet supported).
`from_config`(conf[, version_unmatch])	Return suite object from a CheckConfig object.
`from_json`(conf[, version_unmatch])	Deserialize suite instance from JSON string.
`remove`(index)	Remove a check by given index.
`run`([train_dataset, test_dataset, model, ...])	Run all checks.
`supported_checks`()	Return tuple of supported check types of this suite.
`to_json`([indent])	Serialize suite instance to JSON string.

__init__(name: str, *checks: Union[BaseCheck, BaseSuite])[source]#

add(check: Union[BaseCheck, BaseSuite])[source]#

Add a check or a suite to current suite.

Parameters

checkBaseCheck: A check or suite to add.

config() → SuiteConfig[source]#

Return suite configuration (checks’ conditions’ configuration not yet supported).

Returns

SuiteConfig: includes the suite name, and list of check configs.

classmethod from_config(conf: SuiteConfig, version_unmatch: Optional[Union[Literal['raise'], Literal['warn']]] = 'warn') → Self[source]#

Return suite object from a CheckConfig object.

Parameters

confSuiteConfig: the SuiteConfig object

Returns

BaseSuite: the suite class object from given config

from_json(conf: str, version_unmatch: Optional[Union[Literal['raise'], Literal['warn']]] = 'warn') → Self[source]#: Deserialize suite instance from JSON string.

remove(index: int)[source]#

Remove a check by given index.

Parameters

indexint: Index of check to remove.

run(train_dataset: Optional[Union[Dataset, DataFrame]] = None, test_dataset: Optional[Union[Dataset, DataFrame]] = None, model: Optional[BasicModel] = None, feature_importance: Optional[Series] = None, feature_importance_force_permutation: bool = False, feature_importance_timeout: int = 120, with_display: bool = True, y_pred_train: Optional[ndarray] = None, y_pred_test: Optional[ndarray] = None, y_proba_train: Optional[ndarray] = None, y_proba_test: Optional[ndarray] = None, run_single_dataset: Optional[str] = None) → SuiteResult[source]#

Run all checks.

Parameters

train_dataset: Optional[Union[Dataset, pd.DataFrame]] , default None: object, representing data an estimator was fitted on
test_datasetOptional[Union[Dataset, pd.DataFrame]] , default None: object, representing data an estimator predicts on
modelOptional[BasicModel] , default None: A scikit-learn-compatible fitted estimator instance
run_single_dataset: Optional[str], default None: ‘Train’, ‘Test’ , or None to run on both train and test.
feature_importance: pd.Series , default: None: pass manual features importance
feature_importance_force_permutationbool , default: False: force calculation of permutation features importance
feature_importance_timeoutint , default: 120: timeout in second for the permutation features importance calculation
y_pred_train: Optional[np.ndarray] , default: None: Array of the model prediction over the train dataset.
y_pred_test: Optional[np.ndarray] , default: None: Array of the model prediction over the test dataset.
y_proba_train: Optional[np.ndarray] , default: None: Array of the model prediction probabilities over the train dataset.
y_proba_test: Optional[np.ndarray] , default: None: Array of the model prediction probabilities over the test dataset.
model_classes: Optional[List] , default: None: For classification: list of classes known to the model

Returns

SuiteResult: All results by all initialized checks

classmethod supported_checks() → Tuple[source]#: Return tuple of supported check types of this suite.

to_json(indent: int = 3) → str[source]#: Serialize suite instance to JSON string.

class SingleDatasetCheck[source]#

Parent class for checks that only use one dataset.

Methods

`add_condition`(name, condition_func, **params)	Add new condition function to the check.
`clean_conditions`()	Remove all conditions from this check instance.
`conditions_decision`(result)	Run conditions on given result.
`config`([include_version])	Return check configuration (conditions' configuration not yet supported).
`context_type`	alias of `Context`
`from_config`(conf[, version_unmatch])	Return check object from a CheckConfig object.
`from_json`(conf[, version_unmatch])	Deserialize check instance from JSON string.
`metadata`([with_doc_link])	Return check metadata.
`name`()	Name of class in split camel case.
`params`([show_defaults])	Return parameters to show when printing the check.
`remove_condition`(index)	Remove given condition by index.
`run`(dataset[, model, feature_importance, ...])	Run check.
`run_logic`(context, dataset_kind)	Run check.
`to_json`([indent])	Serialize check instance to JSON string.

__init__(**kwargs)[source]#

add_condition(name: str, condition_func: Callable[[Any], Union[ConditionResult, bool]], **params)[source]#

Add new condition function to the check.

Parameters

namestr: Name of the condition. should explain the condition action and parameters
condition_funcCallable[[Any], Union[List[ConditionResult], bool]]: Function which gets the value of the check and returns object of List[ConditionResult] or boolean.
paramsdict: Additional parameters to pass when calling the condition function.

clean_conditions()[source]#: Remove all conditions from this check instance.

conditions_decision(result: CheckResult) → List[ConditionResult][source]#: Run conditions on given result.

config(include_version: bool = True) → CheckConfig[source]#

Return check configuration (conditions’ configuration not yet supported).

Returns

CheckConfig: includes the checks class name, params, and module name.

context_type[source]#: alias of Context

classmethod from_config(conf: CheckConfig, version_unmatch: Optional[Union[Literal['raise'], Literal['warn']]] = 'warn') → Self[source]#

Return check object from a CheckConfig object.

Parameters

confDict[Any, Any]

Returns

BaseCheck: the check class object from given config

from_json(conf: str, version_unmatch: Optional[Union[Literal['raise'], Literal['warn']]] = 'warn') → Self[source]#: Deserialize check instance from JSON string.

metadata(with_doc_link: bool = False) → CheckMetadata[source]#

Return check metadata.

Parameters

with_doc_linkbool, default False: whethere to include doc link in summary or not

Returns

Dict[str, Any]

classmethod name() → str[source]#: Name of class in split camel case.

params(show_defaults: bool = False) → Dict[source]#: Return parameters to show when printing the check.

remove_condition(index: int)[source]#

Remove given condition by index.

Parameters

indexint: index of condtion to remove

run(dataset: Union[Dataset, DataFrame], model: Optional[BasicModel] = None, feature_importance: Optional[Series] = None, feature_importance_force_permutation: bool = False, feature_importance_timeout: int = 120, with_display: bool = True, y_pred: Optional[ndarray] = None, y_proba: Optional[ndarray] = None, y_pred_train: Optional[ndarray] = None, y_pred_test: Optional[ndarray] = None, y_proba_train: Optional[ndarray] = None, y_proba_test: Optional[ndarray] = None, model_classes: Optional[List] = None) → CheckResult[source]#

Run check.

Parameters

dataset: Union[Dataset, pd.DataFrame]: Dataset or DataFrame object, representing data an estimator was fitted on
model: Optional[BasicModel], default: None: A scikit-learn-compatible fitted estimator instance
feature_importance: pd.Series , default: None: pass manual features importance
feature_importance_force_permutationbool , default: False: force calculation of permutation features importance
feature_importance_timeoutint , default: 120: timeout in second for the permutation features importance calculation
y_pred_train: Optional[np.ndarray] , default: None: Array of the model prediction over the train dataset.
y_pred_test: Optional[np.ndarray] , default: None: Array of the model prediction over the test dataset.
y_proba_train: Optional[np.ndarray] , default: None: Array of the model prediction probabilities over the train dataset.
y_proba_test: Optional[np.ndarray] , default: None: Array of the model prediction probabilities over the test dataset.
model_classes: Optional[List] , default: None: For classification: list of classes known to the model

abstract run_logic(context, dataset_kind) → CheckResult[source]#: Run check.

to_json(indent: int = 3) → str[source]#: Serialize check instance to JSON string.

class TrainTestCheck[source]#

Parent class for checks that compare two datasets.

The class checks train dataset and test dataset for model training and test.

Methods

`add_condition`(name, condition_func, **params)	Add new condition function to the check.
`clean_conditions`()	Remove all conditions from this check instance.
`conditions_decision`(result)	Run conditions on given result.
`config`([include_version])	Return check configuration (conditions' configuration not yet supported).
`context_type`	alias of `Context`
`from_config`(conf[, version_unmatch])	Return check object from a CheckConfig object.
`from_json`(conf[, version_unmatch])	Deserialize check instance from JSON string.
`metadata`([with_doc_link])	Return check metadata.
`name`()	Name of class in split camel case.
`params`([show_defaults])	Return parameters to show when printing the check.
`remove_condition`(index)	Remove given condition by index.
`run`(train_dataset, test_dataset[, model, ...])	Run check.
`run_logic`(context)	Run check.
`to_json`([indent])	Serialize check instance to JSON string.

__init__(**kwargs)[source]#

add_condition(name: str, condition_func: Callable[[Any], Union[ConditionResult, bool]], **params)[source]#

Add new condition function to the check.

Parameters

namestr: Name of the condition. should explain the condition action and parameters
condition_funcCallable[[Any], Union[List[ConditionResult], bool]]: Function which gets the value of the check and returns object of List[ConditionResult] or boolean.
paramsdict: Additional parameters to pass when calling the condition function.

clean_conditions()[source]#: Remove all conditions from this check instance.

conditions_decision(result: CheckResult) → List[ConditionResult][source]#: Run conditions on given result.

config(include_version: bool = True) → CheckConfig[source]#

Return check configuration (conditions’ configuration not yet supported).

Returns

CheckConfig: includes the checks class name, params, and module name.

context_type[source]#: alias of Context

classmethod from_config(conf: CheckConfig, version_unmatch: Optional[Union[Literal['raise'], Literal['warn']]] = 'warn') → Self[source]#

Return check object from a CheckConfig object.

Parameters

confDict[Any, Any]

Returns

BaseCheck: the check class object from given config

from_json(conf: str, version_unmatch: Optional[Union[Literal['raise'], Literal['warn']]] = 'warn') → Self[source]#: Deserialize check instance from JSON string.

metadata(with_doc_link: bool = False) → CheckMetadata[source]#

Return check metadata.

Parameters

with_doc_linkbool, default False: whethere to include doc link in summary or not

Returns

Dict[str, Any]

classmethod name() → str[source]#: Name of class in split camel case.

params(show_defaults: bool = False) → Dict[source]#: Return parameters to show when printing the check.

remove_condition(index: int)[source]#

Remove given condition by index.

Parameters

indexint: index of condtion to remove

run(train_dataset: Union[Dataset, DataFrame], test_dataset: Union[Dataset, DataFrame], model: Optional[BasicModel] = None, feature_importance: Optional[Series] = None, feature_importance_force_permutation: bool = False, feature_importance_timeout: int = 120, with_display: bool = True, y_pred_train: Optional[ndarray] = None, y_pred_test: Optional[ndarray] = None, y_proba_train: Optional[ndarray] = None, y_proba_test: Optional[ndarray] = None, model_classes: Optional[List] = None) → CheckResult[source]#

Run check.

Parameters

train_dataset: Union[Dataset, pd.DataFrame]: Dataset or DataFrame object, representing data an estimator was fitted on
test_dataset: Union[Dataset, pd.DataFrame]: Dataset or DataFrame object, representing data an estimator predicts on
model: Optional[BasicModel], default: None: A scikit-learn-compatible fitted estimator instance
feature_importance: pd.Series , default: None: pass manual features importance
feature_importance_force_permutationbool , default: False: force calculation of permutation features importance
feature_importance_timeoutint , default: 120: timeout in second for the permutation features importance calculation
y_pred_train: Optional[np.ndarray] , default: None: Array of the model prediction over the train dataset.
y_pred_test: Optional[np.ndarray] , default: None: Array of the model prediction over the test dataset.
y_proba_train: Optional[np.ndarray] , default: None: Array of the model prediction probabilities over the train dataset.
y_proba_test: Optional[np.ndarray] , default: None: Array of the model prediction probabilities over the test dataset.
model_classes: Optional[List] , default: None: For classification: list of classes known to the model

abstract run_logic(context) → CheckResult[source]#: Run check.

to_json(indent: int = 3) → str[source]#: Serialize check instance to JSON string.

class ModelOnlyCheck[source]#

Parent class for checks that only use a model and no datasets.

Methods

`add_condition`(name, condition_func, **params)	Add new condition function to the check.
`clean_conditions`()	Remove all conditions from this check instance.
`conditions_decision`(result)	Run conditions on given result.
`config`([include_version])	Return check configuration (conditions' configuration not yet supported).
`context_type`	alias of `Context`
`from_config`(conf[, version_unmatch])	Return check object from a CheckConfig object.
`from_json`(conf[, version_unmatch])	Deserialize check instance from JSON string.
`metadata`([with_doc_link])	Return check metadata.
`name`()	Name of class in split camel case.
`params`([show_defaults])	Return parameters to show when printing the check.
`remove_condition`(index)	Remove given condition by index.
`run`(model[, feature_importance, ...])	Run check.
`run_logic`(context)	Run check.
`to_json`([indent])	Serialize check instance to JSON string.

__init__(**kwargs)[source]#

add_condition(name: str, condition_func: Callable[[Any], Union[ConditionResult, bool]], **params)[source]#

Add new condition function to the check.

Parameters

namestr: Name of the condition. should explain the condition action and parameters
condition_funcCallable[[Any], Union[List[ConditionResult], bool]]: Function which gets the value of the check and returns object of List[ConditionResult] or boolean.
paramsdict: Additional parameters to pass when calling the condition function.

clean_conditions()[source]#: Remove all conditions from this check instance.

conditions_decision(result: CheckResult) → List[ConditionResult][source]#: Run conditions on given result.

config(include_version: bool = True) → CheckConfig[source]#

Return check configuration (conditions’ configuration not yet supported).

Returns

CheckConfig: includes the checks class name, params, and module name.

context_type[source]#: alias of Context

classmethod from_config(conf: CheckConfig, version_unmatch: Optional[Union[Literal['raise'], Literal['warn']]] = 'warn') → Self[source]#

Return check object from a CheckConfig object.

Parameters

confDict[Any, Any]

Returns

BaseCheck: the check class object from given config

from_json(conf: str, version_unmatch: Optional[Union[Literal['raise'], Literal['warn']]] = 'warn') → Self[source]#: Deserialize check instance from JSON string.

metadata(with_doc_link: bool = False) → CheckMetadata[source]#

Return check metadata.

Parameters

with_doc_linkbool, default False: whethere to include doc link in summary or not

Returns

Dict[str, Any]

classmethod name() → str[source]#: Name of class in split camel case.

params(show_defaults: bool = False) → Dict[source]#: Return parameters to show when printing the check.

remove_condition(index: int)[source]#

Remove given condition by index.

Parameters

indexint: index of condtion to remove

run(model: BasicModel, feature_importance: Optional[Series] = None, feature_importance_force_permutation: bool = False, feature_importance_timeout: int = 120, with_display: bool = True, y_pred_train: Optional[ndarray] = None, y_pred_test: Optional[ndarray] = None, y_proba_train: Optional[ndarray] = None, y_proba_test: Optional[ndarray] = None) → CheckResult[source]#

Run check.

Parameters

model: BasicModel: A scikit-learn-compatible fitted estimator instance
feature_importance: pd.Series , default: None: pass manual features importance
feature_importance_force_permutationbool , default: False: force calculation of permutation features importance
feature_importance_timeoutint , default: 120: timeout in second for the permutation features importance calculation
y_pred_train: Optional[np.ndarray] , default: None: Array of the model prediction over the train dataset.
y_pred_test: Optional[np.ndarray] , default: None: Array of the model prediction over the test dataset.
y_proba_train: Optional[np.ndarray] , default: None: Array of the model prediction probabilities over the train dataset.
y_proba_test: Optional[np.ndarray] , default: None: Array of the model prediction probabilities over the test dataset.
model_classes: Optional[List] , default: None: For classification: list of classes known to the model

abstract run_logic(context) → CheckResult[source]#: Run check.

to_json(indent: int = 3) → str[source]#: Serialize check instance to JSON string.

class ModelComparisonContext[source]#

Contain processed input for model comparison checks.

Attributes

models: Return the models’ dict.

Methods

finalize_check_result(check_result, check)

Run final processing on a check result which includes validation and conditions processing.

__init__(train_datasets: Union[Dataset, List[Dataset]], test_datasets: Union[Dataset, List[Dataset]], models: Union[List[Any], Mapping[str, Any]])[source]#: Preprocess the parameters.

finalize_check_result(check_result, check)[source]#: Run final processing on a check result which includes validation and conditions processing.

property models: Dict#: Return the models’ dict.

class ModelComparisonCheck[source]#

Parent class for check that compares between two or more models.

Methods

`add_condition`(name, condition_func, **params)	Add new condition function to the check.
`clean_conditions`()	Remove all conditions from this check instance.
`conditions_decision`(result)	Run conditions on given result.
`config`([include_version])	Return check configuration (conditions' configuration not yet supported).
`from_config`(conf[, version_unmatch])	Return check object from a CheckConfig object.
`from_json`(conf[, version_unmatch])	Deserialize check instance from JSON string.
`metadata`([with_doc_link])	Return check metadata.
`name`()	Name of class in split camel case.
`params`([show_defaults])	Return parameters to show when printing the check.
`remove_condition`(index)	Remove given condition by index.
`run`(train_datasets, test_datasets, models)	Initialize context and pass to check logic.
`run_logic`(multi_context)	Implement here logic of check.
`to_json`([indent])	Serialize check instance to JSON string.

__init__(**kwargs)[source]#

add_condition(name: str, condition_func: Callable[[Any], Union[ConditionResult, bool]], **params)[source]#

Add new condition function to the check.

Parameters

namestr: Name of the condition. should explain the condition action and parameters
condition_funcCallable[[Any], Union[List[ConditionResult], bool]]: Function which gets the value of the check and returns object of List[ConditionResult] or boolean.
paramsdict: Additional parameters to pass when calling the condition function.

clean_conditions()[source]#: Remove all conditions from this check instance.

conditions_decision(result: CheckResult) → List[ConditionResult][source]#: Run conditions on given result.

config(include_version: bool = True) → CheckConfig[source]#

Return check configuration (conditions’ configuration not yet supported).

Returns

CheckConfig: includes the checks class name, params, and module name.

classmethod from_config(conf: CheckConfig, version_unmatch: Optional[Union[Literal['raise'], Literal['warn']]] = 'warn') → Self[source]#

Return check object from a CheckConfig object.

Parameters

confDict[Any, Any]

Returns

BaseCheck: the check class object from given config

from_json(conf: str, version_unmatch: Optional[Union[Literal['raise'], Literal['warn']]] = 'warn') → Self[source]#: Deserialize check instance from JSON string.

metadata(with_doc_link: bool = False) → CheckMetadata[source]#

Return check metadata.

Parameters

with_doc_linkbool, default False: whethere to include doc link in summary or not

Returns

Dict[str, Any]

classmethod name() → str[source]#: Name of class in split camel case.

params(show_defaults: bool = False) → Dict[source]#: Return parameters to show when printing the check.

remove_condition(index: int)[source]#

Remove given condition by index.

Parameters

indexint: index of condtion to remove

run(train_datasets: Union[Dataset, List[Dataset]], test_datasets: Union[Dataset, List[Dataset]], models: Union[List[BasicModel], Mapping[str, BasicModel]]) → CheckResult[source]#

Initialize context and pass to check logic.

Parameters

train_datasets: Union[Dataset, List[Dataset]]: train datasets
test_datasets: Union[Dataset, List[Dataset]]: test datasets
models: Union[List[BasicModel], Mapping[str, BasicModel]]: list or map of models

abstract run_logic(multi_context: ModelComparisonContext) → CheckResult[source]#: Implement here logic of check.

to_json(indent: int = 3) → str[source]#: Serialize check instance to JSON string.

class ModelComparisonSuite[source]#

Suite to run checks of types: CompareModelsBaseCheck.

Methods

`add`(check)	Add a check or a suite to current suite.
`config`()	Return suite configuration (checks' conditions' configuration not yet supported).
`from_config`(conf[, version_unmatch])	Return suite object from a CheckConfig object.
`from_json`(conf[, version_unmatch])	Deserialize suite instance from JSON string.
`remove`(index)	Remove a check by given index.
`run`(train_datasets, test_datasets, models)	Run all checks.
`supported_checks`()	Return tuple of supported check types of this suite.
`to_json`([indent])	Serialize suite instance to JSON string.

__init__(name: str, *checks: Union[BaseCheck, BaseSuite])[source]#

add(check: Union[BaseCheck, BaseSuite])[source]#

Add a check or a suite to current suite.

Parameters

checkBaseCheck: A check or suite to add.

config() → SuiteConfig[source]#

Return suite configuration (checks’ conditions’ configuration not yet supported).

Returns

SuiteConfig: includes the suite name, and list of check configs.

classmethod from_config(conf: SuiteConfig, version_unmatch: Optional[Union[Literal['raise'], Literal['warn']]] = 'warn') → Self[source]#

Return suite object from a CheckConfig object.

Parameters

confSuiteConfig: the SuiteConfig object

Returns

BaseSuite: the suite class object from given config

from_json(conf: str, version_unmatch: Optional[Union[Literal['raise'], Literal['warn']]] = 'warn') → Self[source]#: Deserialize suite instance from JSON string.

remove(index: int)[source]#

Remove a check by given index.

Parameters

indexint: Index of check to remove.

run(train_datasets: Union[Dataset, List[Dataset]], test_datasets: Union[Dataset, List[Dataset]], models: Union[List[Any], Mapping[str, Any]]) → SuiteResult[source]#

Run all checks.

Parameters

train_datasetsUnion[Dataset, Container[Dataset]]: representing data an estimator was fitted on
test_datasets: Union[Dataset, Container[Dataset]]: representing data an estimator was fitted on
modelsUnion[Container[Any], Mapping[str, Any]]: 2 or more scikit-learn-compatible fitted estimator instance
Returns
——-
SuiteResult: All results by all initialized checks
Raises
——
ValueError: if check_datasets_policy is not of allowed types

classmethod supported_checks() → Tuple[source]#: Return tuple of supported check types of this suite.

to_json(indent: int = 3) → str[source]#: Serialize suite instance to JSON string.

SuiteResult.to_widget

checks