deepchecks.tabular#

Package for tabular functionality.

Modules

checks

Module importing all tabular checks.

suites

Module contains all prebuilt suites.

datasets

Module for working with pre-built datasets.

Classes

class Dataset[source]#

Dataset wraps pandas DataFrame together with ML related metadata.

The Dataset class is containing additional data and methods intended for easily accessing metadata relevant for the training or validating of an ML models.

Parameters
dfAny
An object that can be casted to a pandas DataFrame
  • containing data relevant for the training or validating of a ML models.

labelt.Union[Hashable, pd.Series, pd.DataFrame, np.ndarray] , default: None

label column provided either as a string with the name of an existing column in the DataFrame or a label object including the label data (pandas Series/DataFrame or a numpy array) that will be concatenated to the data in the DataFrame. in case of label data the following logic is applied to set the label name: - Series: takes the series name or ‘target’ if name is empty - DataFrame: expect single column in the dataframe and use its name - numpy: use ‘target’

featurest.Optional[t.Sequence[Hashable]] , default: None

List of names for the feature columns in the DataFrame.

cat_featurest.Optional[t.Sequence[Hashable]] , default: None

List of names for the categorical features in the DataFrame. In order to disable categorical. features inference, pass cat_features=[]

index_namet.Optional[Hashable] , default: None

Name of the index column in the dataframe. If set_index_from_dataframe_index is True and index_name is not None, index will be created from the dataframe index level with the given name. If index levels have no names, an int must be used to select the appropriate level by order.

set_index_from_dataframe_indexbool , default: False

If set to true, index will be created from the dataframe index instead of dataframe columns (default). If index_name is None, first level of the index will be used in case of a multilevel index.

datetime_namet.Optional[Hashable] , default: None

Name of the datetime column in the dataframe. If set_datetime_from_dataframe_index is True and datetime_name is not None, date will be created from the dataframe index level with the given name. If index levels have no names, an int must be used to select the appropriate level by order.

set_datetime_from_dataframe_indexbool , default: False

If set to true, date will be created from the dataframe index instead of dataframe columns (default). If datetime_name is None, first level of the index will be used in case of a multilevel index.

convert_datetimebool , default: True

If set to true, date will be converted to datetime using pandas.to_datetime.

datetime_argst.Optional[t.Dict] , default: None

pandas.to_datetime args used for conversion of the datetime column. (look at https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html for more documentation)

max_categorical_ratiofloat , default: 0.01

The max ratio of unique values in a column in order for it to be inferred as a categorical feature.

max_categoriesint , default: None

The maximum number of categories in a column in order for it to be inferred as a categorical feature. if None, uses is_categorical default inference mechanism.

label_typestr , default: None

Used to assume target model type if not found on model. Values (‘classification_label’, ‘regression_label’) If None then label type is inferred from label using is_categorical logic.

Attributes
cat_features

Return list of categorical feature names.

classes

Return the classes from label column in sorted list.

columns_info

Return the role and logical type of each column.

data

Return the data of dataset.

datetime_col

Return datetime column if exists.

datetime_name

If datetime column exists, return its name.

features

Return list of feature names.

features_columns

Return DataFrame containing only the features defined in the dataset, if features are empty raise error.

index_col

Return index column.

index_name

If index column exists, return its name.

label_col

Return Series of the label defined in the dataset, if label is not defined raise error.

label_name

If label column exists, return its name.

label_type

Return the label type.

n_samples

Return number of samples in dataframe.

numerical_features

Return list of numerical feature names.

Methods

assert_datetime()

Check if datetime is defined and if not raise error.

assert_features()

Check if features are defined (not empty) and if not raise error.

assert_index()

Check if index is defined and if not raise error.

cast_to_dataset(obj)

Verify Dataset or transform to Dataset.

copy(new_data)

Create a copy of this Dataset with new data.

datasets_share_categorical_features(*datasets)

Verify that all provided datasets share same categorical features.

datasets_share_date(*datasets)

Verify that all provided datasets share same date column.

datasets_share_features(*datasets)

Verify that all provided datasets share same features.

datasets_share_index(*datasets)

Verify that all provided datasets share same index column.

datasets_share_label(*datasets)

Verify that all provided datasets share same label column.

from_numpy(*args[, columns, label_name])

Create Dataset instance from numpy arrays.

get_datetime_column_from_index(datetime_name)

Retrieve the datetime info from the index if _set_datetime_from_dataframe_index is True.

has_label()

Return True if label column exists.

is_categorical(col_name)

Check if a column is considered a category column in the dataset object.

is_sampled(n_samples)

Return True if the dataset number of samples will decrease when sampled with n_samples samples.

len_when_sampled(n_samples)

Return number of samples in the sampled dataframe this dataset is sampled with n_samples samples.

sample(n_samples[, replace, random_state, ...])

Create a copy of the dataset object, with the internal dataframe being a sample of the original dataframe.

select([columns, ignore_columns, keep_label])

Filter dataset columns by given params.

train_test_split([train_size, test_size, ...])

Split dataset into random train and test datasets.

__init__(df: Any, label: Optional[Union[Hashable, Series, DataFrame, ndarray]] = None, features: Optional[Sequence[Hashable]] = None, cat_features: Optional[Sequence[Hashable]] = None, index_name: Optional[Hashable] = None, set_index_from_dataframe_index: bool = False, datetime_name: Optional[Hashable] = None, set_datetime_from_dataframe_index: bool = False, convert_datetime: bool = True, datetime_args: Optional[Dict] = None, max_categorical_ratio: float = 0.01, max_categories: Optional[int] = None, label_type: Optional[str] = None)[source]#
assert_datetime()[source]#

Check if datetime is defined and if not raise error.

Raises
DeepchecksNotSupportedError
assert_features()[source]#

Check if features are defined (not empty) and if not raise error.

Raises
DeepchecksNotSupportedError
assert_index()[source]#

Check if index is defined and if not raise error.

Raises
DeepchecksNotSupportedError
classmethod cast_to_dataset(obj: Any) Dataset[source]#

Verify Dataset or transform to Dataset.

Function verifies that provided value is a non-empty instance of Dataset, otherwise raises an exception, but if the ‘cast’ flag is set to True it will also try to transform provided value to the Dataset instance.

Parameters
obj

value to verify

Raises
DeepchecksValueError

if the provided value is not a Dataset instance; if the provided value cannot be transformed into Dataset instance;

property cat_features: List[Hashable]#

Return list of categorical feature names.

Returns
t.List[Hashable]

List of categorical feature names.

property classes: Tuple[str, ...]#

Return the classes from label column in sorted list. if no label column defined, return empty list.

Returns
t.Tuple[str, …]

Sorted classes

property columns_info: Dict[Hashable, str]#

Return the role and logical type of each column.

Returns
t.Dict[Hashable, str]

Directory of a column and its role

copy(new_data: DataFrame) TDataset[source]#

Create a copy of this Dataset with new data.

Parameters
new_data (DataFrame): new data from which new dataset will be created
Returns
Dataset

new dataset instance

property data: pandas.core.frame.DataFrame#

Return the data of dataset.

classmethod datasets_share_categorical_features(*datasets: Dataset) bool[source]#

Verify that all provided datasets share same categorical features.

Parameters
datasetsList[Dataset]

list of datasets to validate

Returns
bool

True if all datasets share same categorical features, otherwise False

Raises
AssertionError

‘datasets’ parameter is not a list; ‘datasets’ contains less than one dataset;

classmethod datasets_share_date(*datasets: Dataset) bool[source]#

Verify that all provided datasets share same date column.

Parameters
datasetsList[Dataset]

list of datasets to validate

Returns
bool

True if all datasets share same date column, otherwise False

Raises
AssertionError

‘datasets’ parameter is not a list; ‘datasets’ contains less than one dataset;

classmethod datasets_share_features(*datasets: Dataset) bool[source]#

Verify that all provided datasets share same features.

Parameters
datasetsList[Dataset]

list of datasets to validate

Returns
bool

True if all datasets share same features, otherwise False

Raises
AssertionError

‘datasets’ parameter is not a list; ‘datasets’ contains less than one dataset;

classmethod datasets_share_index(*datasets: Dataset) bool[source]#

Verify that all provided datasets share same index column.

Parameters
datasetsList[Dataset]

list of datasets to validate

Returns
bool

True if all datasets share same index column, otherwise False

Raises
AssertionError

‘datasets’ parameter is not a list; ‘datasets’ contains less than one dataset;

classmethod datasets_share_label(*datasets: Dataset) bool[source]#

Verify that all provided datasets share same label column.

Parameters
datasetsList[Dataset]

list of datasets to validate

Returns
bool

True if all datasets share same categorical features, otherwise False

Raises
AssertionError

‘datasets’ parameter is not a list; ‘datasets’ contains less than one dataset;

property datetime_col: Optional[pandas.core.series.Series]#

Return datetime column if exists.

Returns
t.Optional[pd.Series]

Series of the datetime column

property datetime_name: Optional[Hashable]#

If datetime column exists, return its name.

Returns
t.Optional[Hashable]

datetime name

property features: List[Hashable]#

Return list of feature names.

Returns
t.List[Hashable]

List of feature names.

property features_columns: pandas.core.frame.DataFrame#

Return DataFrame containing only the features defined in the dataset, if features are empty raise error.

Returns
pd.DataFrame
classmethod from_numpy(*args: ndarray, columns: Optional[Sequence[Hashable]] = None, label_name: Optional[Hashable] = None, **kwargs) TDataset[source]#

Create Dataset instance from numpy arrays.

Parameters
*args: np.ndarray

Numpy array of data columns, and second optional numpy array of labels.

columnst.Sequence[Hashable] , default: None

names for the columns. If none provided, the names that will be automatically assigned to the columns will be: 1 - n (where n - number of columns)

label_namet.Hashable , default: None

labels column name. If none is provided, the name ‘target’ will be used.

**kwargsDict

additional arguments that will be passed to the main Dataset constructor.

Returns
——-
Dataset

instance of the Dataset

Raises
——
DeepchecksValueError

if receives zero or more than two numpy arrays. if columns (args[0]) is not two dimensional numpy array. if labels (args[1]) is not one dimensional numpy array. if features array or labels array is empty.

Examples

>>> import numpy
>>> from deepchecks.tabular import Dataset
>>> features = numpy.array([[0.25, 0.3, 0.3],
...                        [0.14, 0.75, 0.3],
...                        [0.23, 0.39, 0.1]])
>>> labels = numpy.array([0.1, 0.1, 0.7])
>>> dataset = Dataset.from_numpy(features, labels)

Creating dataset only from features array.

>>> dataset = Dataset.from_numpy(features)

Passing additional arguments to the main Dataset constructor

>>> dataset = Dataset.from_numpy(features, labels, max_categorical_ratio=0.5)

Specifying features and label columns names.

>>> dataset = Dataset.from_numpy(
...     features, labels,
...     columns=['sensor-1', 'sensor-2', 'sensor-3'],
...     label_name='labels'
... )
get_datetime_column_from_index(datetime_name)[source]#

Retrieve the datetime info from the index if _set_datetime_from_dataframe_index is True.

has_label() bool[source]#

Return True if label column exists.

Returns
bool

True if label column exists.

property index_col: Optional[pandas.core.series.Series]#

Return index column. Index can be a named column or DataFrame index.

Returns
t.Optional[pd.Series]

If index column exists, returns a pandas Series of the index column.

property index_name: Optional[Hashable]#

If index column exists, return its name.

Returns
t.Optional[Hashable]

index name

is_categorical(col_name: Hashable) bool[source]#

Check if a column is considered a category column in the dataset object.

Parameters
col_nameHashable

The name of the column in the dataframe

Returns
bool

If is categorical according to input numbers

is_sampled(n_samples: int)[source]#

Return True if the dataset number of samples will decrease when sampled with n_samples samples.

property label_col: pandas.core.series.Series#

Return Series of the label defined in the dataset, if label is not defined raise error.

Returns
pd.Series
property label_name: Optional[Hashable]#

If label column exists, return its name. Otherwise, throw an exception.

Returns
t.Optional[Hashable]

Label name

property label_type: Optional[deepchecks.tabular.utils.task_type.TaskType]#

Return the label type.

Returns
t.Optional[TaskType]

Label type

len_when_sampled(n_samples: int)[source]#

Return number of samples in the sampled dataframe this dataset is sampled with n_samples samples.

property n_samples: int#

Return number of samples in dataframe.

Returns
int

Number of samples in dataframe

property numerical_features: List[Hashable]#

Return list of numerical feature names.

Returns
t.List[Hashable]

List of numerical feature names.

sample(n_samples: int, replace: bool = False, random_state: Optional[int] = None, drop_na_label: bool = False) TDataset[source]#

Create a copy of the dataset object, with the internal dataframe being a sample of the original dataframe.

Parameters
n_samplesint

Number of samples to draw.

replacebool, default: False

Whether to sample with replacement.

random_statet.Optional[int] , default None

Random state.

drop_na_labelbool, default: False

Whether to take sample only from rows with exiting label.

Returns
Dataset

instance of the Dataset with sampled internal dataframe.

select(columns: Optional[Union[Hashable, List[Hashable]]] = None, ignore_columns: Optional[Union[Hashable, List[Hashable]]] = None, keep_label: bool = False) TDataset[source]#

Filter dataset columns by given params.

Parameters
columnsUnion[Hashable, List[Hashable], None]

Column names to keep.

ignore_columnsUnion[Hashable, List[Hashable], None]

Column names to drop.

Returns
TDataset

horizontally filtered dataset

Raises
DeepchecksValueError

In case one of columns given don’t exists raise error

train_test_split(train_size: Optional[Union[int, float]] = None, test_size: Union[int, float] = 0.25, random_state: int = 42, shuffle: bool = True, stratify: Union[List, Series, ndarray, bool] = False) Tuple[TDataset, TDataset][source]#

Split dataset into random train and test datasets.

Parameters
train_sizet.Union[int, float, None] , default: None

If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size.

test_sizet.Union[int, float] , default: 0.25

If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples.

random_stateint , default: 42

The random state to use for shuffling.

shufflebool , default: True

Whether to shuffle the data before splitting.

stratifyt.Union[t.List, pd.Series, np.ndarray, bool] , default: False

If True, data is split in a stratified fashion, using the class labels. If array-like, data is split in a stratified fashion, using this as class labels.

Returns
——-
Dataset

Dataset containing train split data.

Dataset

Dataset containing test split data.

class Context[source]#

Contains all the data + properties the user has passed to a check/suite, and validates it seamlessly.

Parameters
train: Union[Dataset, pd.DataFrame, None] , default: None

Dataset or DataFrame object, representing data an estimator was fitted on

test: Union[Dataset, pd.DataFrame, None] , default: None

Dataset or DataFrame object, representing data an estimator predicts on

model: Optional[BasicModel] , default: None

A scikit-learn-compatible fitted estimator instance

feature_importance: pd.Series , default: None

pass manual features importance

feature_importance_force_permutationbool , default: False

force calculation of permutation features importance

feature_importance_timeoutint , default: 120

timeout in second for the permutation features importance calculation

y_pred_train: Optional[np.ndarray] , default: None

Array of the model prediction over the train dataset.

y_pred_test: Optional[np.ndarray] , default: None

Array of the model prediction over the test dataset.

y_proba_train: Optional[np.ndarray] , default: None

Array of the model prediction probabilities over the train dataset.

y_proba_test: Optional[np.ndarray] , default: None

Array of the model prediction probabilities over the test dataset.

features_importance: Optional[pd.Series] , default: None

pass manual features importance .. deprecated:: 0.8.1

Use ‘feature_importance’ instead.

Attributes
feature_importance

Return feature importance, or None if not possible.

feature_importance_type

Return feature importance type if feature importance is available, else None.

features_importance

Return feature importance, or None if not possible.

features_importance_type

Return feature importance type if feature importance is available, else None.

model

Return & validate model if model exists, otherwise raise error.

model_name

Return model name.

task_type

Return task type if model & train & label exists.

test

Return test if exists, otherwise raise error.

train

Return train if exists, otherwise raise error.

with_display

Return the with_display flag.

Methods

assert_classification_task()

Assert the task_type is classification.

assert_regression_task()

Assert the task type is regression.

assert_task_type(*expected_types)

Assert task_type matching given types.

finalize_check_result(check_result, check[, ...])

Run final processing on a check result which includes validation, conditions processing and sampling footnote.

get_data_by_kind(kind)

Return the relevant Dataset by given kind.

get_scorers([scorers, use_avg_defaults])

Return initialized & validated scorers in a given priority.

get_single_scorer([scorers, use_avg_defaults])

Return initialized & validated single scorer in a given priority.

have_test()

Return whether there is test dataset defined.

__init__(train: Optional[Union[Dataset, DataFrame]] = None, test: Optional[Union[Dataset, DataFrame]] = None, model: Optional[BasicModel] = None, feature_importance: Optional[Series] = None, feature_importance_force_permutation: bool = False, feature_importance_timeout: int = 120, with_display: bool = True, y_pred_train: Optional[ndarray] = None, y_pred_test: Optional[ndarray] = None, y_proba_train: Optional[ndarray] = None, y_proba_test: Optional[ndarray] = None)[source]#
assert_classification_task()[source]#

Assert the task_type is classification.

assert_regression_task()[source]#

Assert the task type is regression.

assert_task_type(*expected_types: TaskType)[source]#

Assert task_type matching given types.

If task_type is defined, validate it and raise error if needed, else returns True. If task_type is not defined, return False.

property feature_importance: Optional[pandas.core.series.Series]#

Return feature importance, or None if not possible.

property feature_importance_type: Optional[str]#

Return feature importance type if feature importance is available, else None.

property features_importance: Optional[pandas.core.series.Series]#

Return feature importance, or None if not possible.

property features_importance_type: Optional[str]#

Return feature importance type if feature importance is available, else None.

finalize_check_result(check_result, check, kind: Optional[DatasetKind] = None)[source]#

Run final processing on a check result which includes validation, conditions processing and sampling footnote.

get_data_by_kind(kind: DatasetKind)[source]#

Return the relevant Dataset by given kind.

get_scorers(scorers: Optional[Union[Mapping[str, Union[str, Callable]], List[str]]] = None, use_avg_defaults=True) List[DeepcheckScorer][source]#

Return initialized & validated scorers in a given priority.

If receive scorers use them, Else if user defined global scorers use them, Else use default scorers.

Parameters
scorersUnion[List[str], Dict[str, Union[str, Callable]]], default: None

List of scorers to use. If None, use default scorers. Scorers can be supplied as a list of scorer names or as a dictionary of names and functions.

use_avg_defaultsbool, default True

If no scorers were provided, for classification, determines whether to use default scorers that return an averaged metric, or default scorers that return a metric per class.

Returns
——-
List[DeepcheckScorer]

A list of initialized & validated scorers.

get_single_scorer(scorers: Optional[Mapping[str, Union[str, Callable]]] = None, use_avg_defaults=True) DeepcheckScorer[source]#

Return initialized & validated single scorer in a given priority.

If receive scorers use them, Else if user defined global scorers use them, Else use default scorers. Returns the first scorer from the scorers described above.

Parameters
scorersUnion[List[str], Dict[str, Union[str, Callable]]], default: None

List of scorers to use. If None, use default scorers. Scorers can be supplied as a list of scorer names or as a dictionary of names and functions.

use_avg_defaultsbool, default True

If no scorers were provided, for classification, determines whether to use default scorers that return an averaged metric, or default scorers that return a metric per class.

Returns
——-
List[DeepcheckScorer]

An initialized & validated scorer.

have_test()[source]#

Return whether there is test dataset defined.

property model: BasicModel#

Return & validate model if model exists, otherwise raise error.

property model_name#

Return model name.

property task_type: deepchecks.tabular.utils.task_type.TaskType#

Return task type if model & train & label exists. otherwise, raise error.

property test: Dataset#

Return test if exists, otherwise raise error.

property train: Dataset#

Return train if exists, otherwise raise error.

property with_display: bool#

Return the with_display flag.

class Suite[source]#

Tabular suite to run checks of types: TrainTestCheck, SingleDatasetCheck, ModelOnlyCheck.

Methods

add(check)

Add a check or a suite to current suite.

config()

Return suite configuration (checks' conditions' configuration not yet supported).

from_config(conf)

Return suite object from a CheckConfig object.

remove(index)

Remove a check by given index.

run([train_dataset, test_dataset, model, ...])

Run all checks.

supported_checks()

Return tuple of supported check types of this suite.

__init__(name: str, *checks: Union[BaseCheck, BaseSuite])[source]#
add(check: Union[BaseCheck, BaseSuite])[source]#

Add a check or a suite to current suite.

Parameters
checkBaseCheck

A check or suite to add.

config() SuiteConfig[source]#

Return suite configuration (checks’ conditions’ configuration not yet supported).

Returns
SuiteConfig

includes the suite name, and list of check configs.

static from_config(conf: SuiteConfig) BaseSuite[source]#

Return suite object from a CheckConfig object.

Parameters
confSuiteConfig

the SuiteConfig object

Returns
BaseSuite

the suite class object from given config

remove(index: int)[source]#

Remove a check by given index.

Parameters
indexint

Index of check to remove.

run(train_dataset: Optional[Union[Dataset, DataFrame]] = None, test_dataset: Optional[Union[Dataset, DataFrame]] = None, model: Optional[BasicModel] = None, feature_importance: Optional[Series] = None, feature_importance_force_permutation: bool = False, feature_importance_timeout: int = 120, with_display: bool = True, y_pred_train: Optional[ndarray] = None, y_pred_test: Optional[ndarray] = None, y_proba_train: Optional[ndarray] = None, y_proba_test: Optional[ndarray] = None) SuiteResult[source]#

Run all checks.

Parameters
train_dataset: Optional[Union[Dataset, pd.DataFrame]] , default None

object, representing data an estimator was fitted on

test_datasetOptional[Union[Dataset, pd.DataFrame]] , default None

object, representing data an estimator predicts on

modelOptional[BasicModel] , default None

A scikit-learn-compatible fitted estimator instance

feature_importance: pd.Series , default: None

pass manual features importance

feature_importance_force_permutationbool , default: False

force calculation of permutation features importance

feature_importance_timeoutint , default: 120

timeout in second for the permutation features importance calculation

y_pred_train: Optional[np.ndarray] , default: None

Array of the model prediction over the train dataset.

y_pred_test: Optional[np.ndarray] , default: None

Array of the model prediction over the test dataset.

y_proba_train: Optional[np.ndarray] , default: None

Array of the model prediction probabilities over the train dataset.

y_proba_test: Optional[np.ndarray] , default: None

Array of the model prediction probabilities over the test dataset.

features_importance: Optional[pd.Series] , default: None

pass manual features importance .. deprecated:: 0.8.1

Use ‘feature_importance’ instead.

Returns
SuiteResult

All results by all initialized checks

classmethod supported_checks() Tuple[source]#

Return tuple of supported check types of this suite.

class SingleDatasetCheck[source]#

Parent class for checks that only use one dataset.

Methods

add_condition(name, condition_func, **params)

Add new condition function to the check.

clean_conditions()

Remove all conditions from this check instance.

conditions_decision(result)

Run conditions on given result.

config()

Return check configuration (conditions' configuration not yet supported).

context_type

alias of Context

from_config(conf)

Return check object from a CheckConfig object.

metadata([with_doc_link])

Return check metadata.

name()

Name of class in split camel case.

params([show_defaults])

Return parameters to show when printing the check.

remove_condition(index)

Remove given condition by index.

run(dataset[, model, feature_importance, ...])

Run check.

run_logic(context, dataset_kind)

Run check.

__init__(**kwargs)[source]#
add_condition(name: str, condition_func: Callable[[Any], Union[ConditionResult, bool]], **params)[source]#

Add new condition function to the check.

Parameters
namestr

Name of the condition. should explain the condition action and parameters

condition_funcCallable[[Any], Union[List[ConditionResult], bool]]

Function which gets the value of the check and returns object of List[ConditionResult] or boolean.

paramsdict

Additional parameters to pass when calling the condition function.

clean_conditions()[source]#

Remove all conditions from this check instance.

conditions_decision(result: CheckResult) List[ConditionResult][source]#

Run conditions on given result.

config() CheckConfig[source]#

Return check configuration (conditions’ configuration not yet supported).

Returns
CheckConfig

includes the checks class name, params, and module name.

context_type[source]#

alias of Context

static from_config(conf: CheckConfig) BaseCheck[source]#

Return check object from a CheckConfig object.

Parameters
confCheckConfig

the CheckConfig object

Returns
BaseCheck

the check class object from given config

metadata(with_doc_link: bool = False) CheckMetadata[source]#

Return check metadata.

Parameters
with_doc_linkbool, default False

whethere to include doc link in summary or not

Returns
Dict[str, Any]
classmethod name() str[source]#

Name of class in split camel case.

params(show_defaults: bool = False) Dict[source]#

Return parameters to show when printing the check.

remove_condition(index: int)[source]#

Remove given condition by index.

Parameters
indexint

index of condtion to remove

run(dataset: Union[Dataset, DataFrame], model: Optional[BasicModel] = None, feature_importance: Optional[Series] = None, feature_importance_force_permutation: bool = False, feature_importance_timeout: int = 120, with_display: bool = True, y_pred_train: Optional[ndarray] = None, y_pred_test: Optional[ndarray] = None, y_proba_train: Optional[ndarray] = None, y_proba_test: Optional[ndarray] = None) CheckResult[source]#

Run check.

Parameters
dataset: Union[Dataset, pd.DataFrame]

Dataset or DataFrame object, representing data an estimator was fitted on

model: Optional[BasicModel], default: None

A scikit-learn-compatible fitted estimator instance

feature_importance: pd.Series , default: None

pass manual features importance

feature_importance_force_permutationbool , default: False

force calculation of permutation features importance

feature_importance_timeoutint , default: 120

timeout in second for the permutation features importance calculation

y_pred_train: Optional[np.ndarray] , default: None

Array of the model prediction over the train dataset.

y_pred_test: Optional[np.ndarray] , default: None

Array of the model prediction over the test dataset.

y_proba_train: Optional[np.ndarray] , default: None

Array of the model prediction probabilities over the train dataset.

y_proba_test: Optional[np.ndarray] , default: None

Array of the model prediction probabilities over the test dataset.

features_importance: Optional[pd.Series] , default: None

pass manual features importance .. deprecated:: 0.8.1

Use ‘feature_importance’ instead.

abstract run_logic(context, dataset_kind) CheckResult[source]#

Run check.

class TrainTestCheck[source]#

Parent class for checks that compare two datasets.

The class checks train dataset and test dataset for model training and test.

Methods

add_condition(name, condition_func, **params)

Add new condition function to the check.

clean_conditions()

Remove all conditions from this check instance.

conditions_decision(result)

Run conditions on given result.

config()

Return check configuration (conditions' configuration not yet supported).

context_type

alias of Context

from_config(conf)

Return check object from a CheckConfig object.

metadata([with_doc_link])

Return check metadata.

name()

Name of class in split camel case.

params([show_defaults])

Return parameters to show when printing the check.

remove_condition(index)

Remove given condition by index.

run(train_dataset, test_dataset[, model, ...])

Run check.

run_logic(context)

Run check.

__init__(**kwargs)[source]#
add_condition(name: str, condition_func: Callable[[Any], Union[ConditionResult, bool]], **params)[source]#

Add new condition function to the check.

Parameters
namestr

Name of the condition. should explain the condition action and parameters

condition_funcCallable[[Any], Union[List[ConditionResult], bool]]

Function which gets the value of the check and returns object of List[ConditionResult] or boolean.

paramsdict

Additional parameters to pass when calling the condition function.

clean_conditions()[source]#

Remove all conditions from this check instance.

conditions_decision(result: CheckResult) List[ConditionResult][source]#

Run conditions on given result.

config() CheckConfig[source]#

Return check configuration (conditions’ configuration not yet supported).

Returns
CheckConfig

includes the checks class name, params, and module name.

context_type[source]#

alias of Context

static from_config(conf: CheckConfig) BaseCheck[source]#

Return check object from a CheckConfig object.

Parameters
confCheckConfig

the CheckConfig object

Returns
BaseCheck

the check class object from given config

metadata(with_doc_link: bool = False) CheckMetadata[source]#

Return check metadata.

Parameters
with_doc_linkbool, default False

whethere to include doc link in summary or not

Returns
Dict[str, Any]
classmethod name() str[source]#

Name of class in split camel case.

params(show_defaults: bool = False) Dict[source]#

Return parameters to show when printing the check.

remove_condition(index: int)[source]#

Remove given condition by index.

Parameters
indexint

index of condtion to remove

run(train_dataset: Union[Dataset, DataFrame], test_dataset: Union[Dataset, DataFrame], model: Optional[BasicModel] = None, feature_importance: Optional[Series] = None, feature_importance_force_permutation: bool = False, feature_importance_timeout: int = 120, with_display: bool = True, y_pred_train: Optional[ndarray] = None, y_pred_test: Optional[ndarray] = None, y_proba_train: Optional[ndarray] = None, y_proba_test: Optional[ndarray] = None) CheckResult[source]#

Run check.

Parameters
train_dataset: Union[Dataset, pd.DataFrame]

Dataset or DataFrame object, representing data an estimator was fitted on

test_dataset: Union[Dataset, pd.DataFrame]

Dataset or DataFrame object, representing data an estimator predicts on

model: Optional[BasicModel], default: None

A scikit-learn-compatible fitted estimator instance

feature_importance: pd.Series , default: None

pass manual features importance

feature_importance_force_permutationbool , default: False

force calculation of permutation features importance

feature_importance_timeoutint , default: 120

timeout in second for the permutation features importance calculation

y_pred_train: Optional[np.ndarray] , default: None

Array of the model prediction over the train dataset.

y_pred_test: Optional[np.ndarray] , default: None

Array of the model prediction over the test dataset.

y_proba_train: Optional[np.ndarray] , default: None

Array of the model prediction probabilities over the train dataset.

y_proba_test: Optional[np.ndarray] , default: None

Array of the model prediction probabilities over the test dataset.

features_importance: Optional[pd.Series] , default: None

pass manual features importance .. deprecated:: 0.8.1

Use ‘feature_importance’ instead.

abstract run_logic(context) CheckResult[source]#

Run check.

class ModelOnlyCheck[source]#

Parent class for checks that only use a model and no datasets.

Methods

add_condition(name, condition_func, **params)

Add new condition function to the check.

clean_conditions()

Remove all conditions from this check instance.

conditions_decision(result)

Run conditions on given result.

config()

Return check configuration (conditions' configuration not yet supported).

context_type

alias of Context

from_config(conf)

Return check object from a CheckConfig object.

metadata([with_doc_link])

Return check metadata.

name()

Name of class in split camel case.

params([show_defaults])

Return parameters to show when printing the check.

remove_condition(index)

Remove given condition by index.

run(model[, feature_importance, ...])

Run check.

run_logic(context)

Run check.

__init__(**kwargs)[source]#
add_condition(name: str, condition_func: Callable[[Any], Union[ConditionResult, bool]], **params)[source]#

Add new condition function to the check.

Parameters
namestr

Name of the condition. should explain the condition action and parameters

condition_funcCallable[[Any], Union[List[ConditionResult], bool]]

Function which gets the value of the check and returns object of List[ConditionResult] or boolean.

paramsdict

Additional parameters to pass when calling the condition function.

clean_conditions()[source]#

Remove all conditions from this check instance.

conditions_decision(result: CheckResult) List[ConditionResult][source]#

Run conditions on given result.

config() CheckConfig[source]#

Return check configuration (conditions’ configuration not yet supported).

Returns
CheckConfig

includes the checks class name, params, and module name.

context_type[source]#

alias of Context

static from_config(conf: CheckConfig) BaseCheck[source]#

Return check object from a CheckConfig object.

Parameters
confCheckConfig

the CheckConfig object

Returns
BaseCheck

the check class object from given config

metadata(with_doc_link: bool = False) CheckMetadata[source]#

Return check metadata.

Parameters
with_doc_linkbool, default False

whethere to include doc link in summary or not

Returns
Dict[str, Any]
classmethod name() str[source]#

Name of class in split camel case.

params(show_defaults: bool = False) Dict[source]#

Return parameters to show when printing the check.

remove_condition(index: int)[source]#

Remove given condition by index.

Parameters
indexint

index of condtion to remove

run(model: BasicModel, feature_importance: Optional[Series] = None, feature_importance_force_permutation: bool = False, feature_importance_timeout: int = 120, with_display: bool = True, y_pred_train: Optional[ndarray] = None, y_pred_test: Optional[ndarray] = None, y_proba_train: Optional[ndarray] = None, y_proba_test: Optional[ndarray] = None) CheckResult[source]#

Run check.

Parameters
model: BasicModel

A scikit-learn-compatible fitted estimator instance

feature_importance: pd.Series , default: None

pass manual features importance

feature_importance_force_permutationbool , default: False

force calculation of permutation features importance

feature_importance_timeoutint , default: 120

timeout in second for the permutation features importance calculation

y_pred_train: Optional[np.ndarray] , default: None

Array of the model prediction over the train dataset.

y_pred_test: Optional[np.ndarray] , default: None

Array of the model prediction over the test dataset.

y_proba_train: Optional[np.ndarray] , default: None

Array of the model prediction probabilities over the train dataset.

y_proba_test: Optional[np.ndarray] , default: None

Array of the model prediction probabilities over the test dataset.

features_importance: Optional[pd.Series] , default: None

pass manual features importance .. deprecated:: 0.8.1

Use ‘feature_importance’ instead.

abstract run_logic(context) CheckResult[source]#

Run check.

class ModelComparisonContext[source]#

Contain processed input for model comparison checks.

Attributes
models

Return the models’ dict.

Methods

finalize_check_result(check_result, check)

Run final processing on a check result which includes validation and conditions processing.

__init__(train_datasets: Union[Dataset, List[Dataset]], test_datasets: Union[Dataset, List[Dataset]], models: Union[List[Any], Mapping[str, Any]])[source]#

Preprocess the parameters.

finalize_check_result(check_result, check)[source]#

Run final processing on a check result which includes validation and conditions processing.

property models: Dict#

Return the models’ dict.

class ModelComparisonCheck[source]#

Parent class for check that compares between two or more models.

Methods

add_condition(name, condition_func, **params)

Add new condition function to the check.

clean_conditions()

Remove all conditions from this check instance.

conditions_decision(result)

Run conditions on given result.

config()

Return check configuration (conditions' configuration not yet supported).

from_config(conf)

Return check object from a CheckConfig object.

metadata([with_doc_link])

Return check metadata.

name()

Name of class in split camel case.

params([show_defaults])

Return parameters to show when printing the check.

remove_condition(index)

Remove given condition by index.

run(train_datasets, test_datasets, models)

Initialize context and pass to check logic.

run_logic(multi_context)

Implement here logic of check.

__init__(**kwargs)[source]#
add_condition(name: str, condition_func: Callable[[Any], Union[ConditionResult, bool]], **params)[source]#

Add new condition function to the check.

Parameters
namestr

Name of the condition. should explain the condition action and parameters

condition_funcCallable[[Any], Union[List[ConditionResult], bool]]

Function which gets the value of the check and returns object of List[ConditionResult] or boolean.

paramsdict

Additional parameters to pass when calling the condition function.

clean_conditions()[source]#

Remove all conditions from this check instance.

conditions_decision(result: CheckResult) List[ConditionResult][source]#

Run conditions on given result.

config() CheckConfig[source]#

Return check configuration (conditions’ configuration not yet supported).

Returns
CheckConfig

includes the checks class name, params, and module name.

static from_config(conf: CheckConfig) BaseCheck[source]#

Return check object from a CheckConfig object.

Parameters
confCheckConfig

the CheckConfig object

Returns
BaseCheck

the check class object from given config

metadata(with_doc_link: bool = False) CheckMetadata[source]#

Return check metadata.

Parameters
with_doc_linkbool, default False

whethere to include doc link in summary or not

Returns
Dict[str, Any]
classmethod name() str[source]#

Name of class in split camel case.

params(show_defaults: bool = False) Dict[source]#

Return parameters to show when printing the check.

remove_condition(index: int)[source]#

Remove given condition by index.

Parameters
indexint

index of condtion to remove

run(train_datasets: Union[Dataset, List[Dataset]], test_datasets: Union[Dataset, List[Dataset]], models: Union[List[BasicModel], Mapping[str, BasicModel]]) CheckResult[source]#

Initialize context and pass to check logic.

Parameters
train_datasets: Union[Dataset, List[Dataset]]

train datasets

test_datasets: Union[Dataset, List[Dataset]]

test datasets

models: Union[List[BasicModel], Mapping[str, BasicModel]]

list or map of models

abstract run_logic(multi_context: ModelComparisonContext) CheckResult[source]#

Implement here logic of check.

class ModelComparisonSuite[source]#

Suite to run checks of types: CompareModelsBaseCheck.

Methods

add(check)

Add a check or a suite to current suite.

config()

Return suite configuration (checks' conditions' configuration not yet supported).

from_config(conf)

Return suite object from a CheckConfig object.

remove(index)

Remove a check by given index.

run(train_datasets, test_datasets, models)

Run all checks.

supported_checks()

Return tuple of supported check types of this suite.

__init__(name: str, *checks: Union[BaseCheck, BaseSuite])[source]#
add(check: Union[BaseCheck, BaseSuite])[source]#

Add a check or a suite to current suite.

Parameters
checkBaseCheck

A check or suite to add.

config() SuiteConfig[source]#

Return suite configuration (checks’ conditions’ configuration not yet supported).

Returns
SuiteConfig

includes the suite name, and list of check configs.

static from_config(conf: SuiteConfig) BaseSuite[source]#

Return suite object from a CheckConfig object.

Parameters
confSuiteConfig

the SuiteConfig object

Returns
BaseSuite

the suite class object from given config

remove(index: int)[source]#

Remove a check by given index.

Parameters
indexint

Index of check to remove.

run(train_datasets: Union[Dataset, List[Dataset]], test_datasets: Union[Dataset, List[Dataset]], models: Union[List[Any], Mapping[str, Any]]) SuiteResult[source]#

Run all checks.

Parameters
train_datasetsUnion[Dataset, Container[Dataset]]

representing data an estimator was fitted on

test_datasets: Union[Dataset, Container[Dataset]]

representing data an estimator was fitted on

modelsUnion[Container[Any], Mapping[str, Any]]

2 or more scikit-learn-compatible fitted estimator instance

Returns
——-
SuiteResult

All results by all initialized checks

Raises
——
ValueError

if check_datasets_policy is not of allowed types

classmethod supported_checks() Tuple[source]#

Return tuple of supported check types of this suite.