deepchecks.tabular#
Package for tabular functionality.
Modules
Module importing all tabular checks.  | 
|
Module contains all prebuilt suites.  | 
|
Module for working with pre-built datasets.  | 
Classes
- class Dataset[source]#
 Dataset wraps pandas DataFrame together with ML related metadata.
The Dataset class is containing additional data and methods intended for easily accessing metadata relevant for the training or validating of an ML models.
- Parameters
 - dfAny
 - An object that can be casted to a pandas DataFrame
 containing data relevant for the training or validating of a ML models.
- labelt.Union[Hashable, pd.Series, pd.DataFrame, np.ndarray] , default: None
 label column provided either as a string with the name of an existing column in the DataFrame or a label object including the label data (pandas Series/DataFrame or a numpy array) that will be concatenated to the data in the DataFrame. in case of label data the following logic is applied to set the label name:
Series: takes the series name or ‘target’ if name is empty
DataFrame: expect single column in the dataframe and use its name
numpy: use ‘target’
- featurest.Optional[t.Sequence[Hashable]] , default: None
 List of names for the feature columns in the DataFrame.
- cat_featurest.Optional[t.Sequence[Hashable]] , default: None
 List of names for the categorical features in the DataFrame. In order to disable categorical. features inference, pass cat_features=[]
- index_namet.Optional[Hashable] , default: None
 Name of the index column in the dataframe. If set_index_from_dataframe_index is True and index_name is not None, index will be created from the dataframe index level with the given name. If index levels have no names, an int must be used to select the appropriate level by order.
- set_index_from_dataframe_indexbool , default: False
 If set to true, index will be created from the dataframe index instead of dataframe columns (default). If index_name is None, first level of the index will be used in case of a multilevel index.
- datetime_namet.Optional[Hashable] , default: None
 Name of the datetime column in the dataframe. If set_datetime_from_dataframe_index is True and datetime_name is not None, date will be created from the dataframe index level with the given name. If index levels have no names, an int must be used to select the appropriate level by order.
- set_datetime_from_dataframe_indexbool , default: False
 If set to true, date will be created from the dataframe index instead of dataframe columns (default). If datetime_name is None, first level of the index will be used in case of a multilevel index.
- convert_datetimebool , default: True
 If set to true, date will be converted to datetime using pandas.to_datetime.
- datetime_argst.Optional[t.Dict] , default: None
 pandas.to_datetime args used for conversion of the datetime column. (look at https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html for more documentation)
- max_categorical_ratiofloat , default: 0.01
 The max ratio of unique values in a column in order for it to be inferred as a categorical feature.
- max_categoriesint , default: None
 The maximum number of categories in a column in order for it to be inferred as a categorical feature. if None, uses is_categorical default inference mechanism.
- label_typestr , default: None
 Used to determine the task type. If None, inferred when running a check based on label column and model. Possible values are: ‘multiclass’, ‘binary’ and ‘regression’.
- Attributes
 cat_featuresReturn list of categorical feature names.
classes_in_label_colReturn the classes from label column in sorted list.
columns_infoReturn the role and logical type of each column.
dataReturn the data of dataset.
datetime_colReturn datetime column if exists.
datetime_nameIf datetime column exists, return its name.
featuresReturn list of feature names.
features_columnsReturn DataFrame containing only the features defined in the dataset, if features are empty raise error.
index_colReturn index column.
index_nameIf index column exists, return its name.
label_colReturn Series of the label defined in the dataset, if label is not defined raise error.
label_nameIf label column exists, return its name.
label_typeReturn the label type.
n_samplesReturn number of samples in dataframe.
numerical_featuresReturn list of numerical feature names.
Methods
Check if datetime is defined and if not raise error.
Check if features are defined (not empty) and if not raise error.
Check if index is defined and if not raise error.
cast_to_dataset(obj)Verify Dataset or transform to Dataset.
copy(new_data)Create a copy of this Dataset with new data.
datasets_share_categorical_features(*datasets)Verify that all provided datasets share same categorical features.
datasets_share_date(*datasets)Verify that all provided datasets share same date column.
datasets_share_features(*datasets)Verify that all provided datasets share same features.
datasets_share_index(*datasets)Verify that all provided datasets share same index column.
datasets_share_label(*datasets)Verify that all provided datasets share same label column.
from_numpy(*args[, columns, label_name])Create Dataset instance from numpy arrays.
get_datetime_column_from_index(datetime_name)Retrieve the datetime info from the index if _set_datetime_from_dataframe_index is True.
Return True if label column exists.
is_categorical(col_name)Check if a column is considered a category column in the dataset object.
is_sampled(n_samples)Return True if the dataset number of samples will decrease when sampled with n_samples samples.
len_when_sampled(n_samples)Return number of samples in the sampled dataframe this dataset is sampled with n_samples samples.
sample(n_samples[, replace, random_state, ...])Create a copy of the dataset object, with the internal dataframe being a sample of the original dataframe.
select([columns, ignore_columns, keep_label])Filter dataset columns by given params.
train_test_split([train_size, test_size, ...])Split dataset into random train and test datasets.
- __init__(df: Any, label: Optional[Union[Hashable, Series, DataFrame, ndarray]] = None, features: Optional[Sequence[Hashable]] = None, cat_features: Optional[Sequence[Hashable]] = None, index_name: Optional[Hashable] = None, set_index_from_dataframe_index: bool = False, datetime_name: Optional[Hashable] = None, set_datetime_from_dataframe_index: bool = False, convert_datetime: bool = True, datetime_args: Optional[Dict] = None, max_categorical_ratio: float = 0.01, max_categories: Optional[int] = None, label_type: Optional[str] = None, dataset_name: Optional[str] = None, label_classes=None)[source]#
 
- assert_datetime()[source]#
 Check if datetime is defined and if not raise error.
- Raises
 - DeepchecksNotSupportedError
 
- assert_features()[source]#
 Check if features are defined (not empty) and if not raise error.
- Raises
 - DeepchecksNotSupportedError
 
- assert_index()[source]#
 Check if index is defined and if not raise error.
- Raises
 - DeepchecksNotSupportedError
 
- classmethod cast_to_dataset(obj: Any) Dataset[source]#
 Verify Dataset or transform to Dataset.
Function verifies that provided value is a non-empty instance of Dataset, otherwise raises an exception, but if the ‘cast’ flag is set to True it will also try to transform provided value to the Dataset instance.
- Parameters
 - obj
 value to verify
- Raises
 - DeepchecksValueError
 if the provided value is not a Dataset instance; if the provided value cannot be transformed into Dataset instance;
- property cat_features: List[Hashable]#
 Return list of categorical feature names.
- Returns
 - t.List[Hashable]
 List of categorical feature names.
- property classes_in_label_col: Tuple[str, ...]#
 Return the classes from label column in sorted list. if no label column defined, return empty list.
- Returns
 - t.Tuple[str, …]
 Sorted classes
- property columns_info: Dict[Hashable, str]#
 Return the role and logical type of each column.
- Returns
 - t.Dict[Hashable, str]
 Directory of a column and its role
- copy(new_data: DataFrame) TDataset[source]#
 Create a copy of this Dataset with new data.
- Parameters
 - new_data (DataFrame): new data from which new dataset will be created
 
- Returns
 - Dataset
 new dataset instance
- property data: pandas.core.frame.DataFrame#
 Return the data of dataset.
Verify that all provided datasets share same categorical features.
- Parameters
 - datasetsList[Dataset]
 list of datasets to validate
- Returns
 - bool
 True if all datasets share same categorical features, otherwise False
- Raises
 - AssertionError
 ‘datasets’ parameter is not a list; ‘datasets’ contains less than one dataset;
Verify that all provided datasets share same date column.
- Parameters
 - datasetsList[Dataset]
 list of datasets to validate
- Returns
 - bool
 True if all datasets share same date column, otherwise False
- Raises
 - AssertionError
 ‘datasets’ parameter is not a list; ‘datasets’ contains less than one dataset;
Verify that all provided datasets share same features.
- Parameters
 - datasetsList[Dataset]
 list of datasets to validate
- Returns
 - bool
 True if all datasets share same features, otherwise False
- Raises
 - AssertionError
 ‘datasets’ parameter is not a list; ‘datasets’ contains less than one dataset;
Verify that all provided datasets share same index column.
- Parameters
 - datasetsList[Dataset]
 list of datasets to validate
- Returns
 - bool
 True if all datasets share same index column, otherwise False
- Raises
 - AssertionError
 ‘datasets’ parameter is not a list; ‘datasets’ contains less than one dataset;
Verify that all provided datasets share same label column.
- Parameters
 - datasetsList[Dataset]
 list of datasets to validate
- Returns
 - bool
 True if all datasets share same categorical features, otherwise False
- Raises
 - AssertionError
 ‘datasets’ parameter is not a list; ‘datasets’ contains less than one dataset;
- property datetime_col: Optional[pandas.core.series.Series]#
 Return datetime column if exists.
- Returns
 - t.Optional[pd.Series]
 Series of the datetime column
- property datetime_name: Optional[Hashable]#
 If datetime column exists, return its name.
- Returns
 - t.Optional[Hashable]
 datetime name
- property features: List[Hashable]#
 Return list of feature names.
- Returns
 - t.List[Hashable]
 List of feature names.
- property features_columns: pandas.core.frame.DataFrame#
 Return DataFrame containing only the features defined in the dataset, if features are empty raise error.
- Returns
 - pd.DataFrame
 
- classmethod from_numpy(*args: ndarray, columns: Optional[Sequence[Hashable]] = None, label_name: Optional[Hashable] = None, **kwargs) TDataset[source]#
 Create Dataset instance from numpy arrays.
- Parameters
 - *args: np.ndarray
 Numpy array of data columns, and second optional numpy array of labels.
- columnst.Sequence[Hashable] , default: None
 names for the columns. If none provided, the names that will be automatically assigned to the columns will be: 1 - n (where n - number of columns)
- label_namet.Hashable , default: None
 labels column name. If none is provided, the name ‘target’ will be used.
- **kwargsDict
 additional arguments that will be passed to the main Dataset constructor.
- Returns
 - ——-
 - Dataset
 instance of the Dataset
- Raises
 - ——
 - DeepchecksValueError
 if receives zero or more than two numpy arrays. if columns (args[0]) is not two dimensional numpy array. if labels (args[1]) is not one dimensional numpy array. if features array or labels array is empty.
Examples
>>> import numpy >>> from deepchecks.tabular import Dataset
>>> features = numpy.array([[0.25, 0.3, 0.3], ... [0.14, 0.75, 0.3], ... [0.23, 0.39, 0.1]]) >>> labels = numpy.array([0.1, 0.1, 0.7]) >>> dataset = Dataset.from_numpy(features, labels)
Creating dataset only from features array.
>>> dataset = Dataset.from_numpy(features)
Passing additional arguments to the main Dataset constructor
>>> dataset = Dataset.from_numpy(features, labels, max_categorical_ratio=0.5)
Specifying features and label columns names.
>>> dataset = Dataset.from_numpy( ... features, labels, ... columns=['sensor-1', 'sensor-2', 'sensor-3'], ... label_name='labels' ... )
- get_datetime_column_from_index(datetime_name)[source]#
 Retrieve the datetime info from the index if _set_datetime_from_dataframe_index is True.
- has_label() bool[source]#
 Return True if label column exists.
- Returns
 - bool
 True if label column exists.
- property index_col: Optional[pandas.core.series.Series]#
 Return index column. Index can be a named column or DataFrame index.
- Returns
 - t.Optional[pd.Series]
 If index column exists, returns a pandas Series of the index column.
- property index_name: Optional[Hashable]#
 If index column exists, return its name.
- Returns
 - t.Optional[Hashable]
 index name
- is_categorical(col_name: Hashable) bool[source]#
 Check if a column is considered a category column in the dataset object.
- Parameters
 - col_nameHashable
 The name of the column in the dataframe
- Returns
 - bool
 If is categorical according to input numbers
- is_sampled(n_samples: int)[source]#
 Return True if the dataset number of samples will decrease when sampled with n_samples samples.
- property label_col: pandas.core.series.Series#
 Return Series of the label defined in the dataset, if label is not defined raise error.
- Returns
 - pd.Series
 
- property label_name: Optional[Hashable]#
 If label column exists, return its name. Otherwise, throw an exception.
- Returns
 - t.Optional[Hashable]
 Label name
- property label_type: Optional[deepchecks.tabular.utils.task_type.TaskType]#
 Return the label type.
- Returns
 - t.Optional[TaskType]
 Label type
- len_when_sampled(n_samples: int)[source]#
 Return number of samples in the sampled dataframe this dataset is sampled with n_samples samples.
- property n_samples: int#
 Return number of samples in dataframe.
- Returns
 - int
 Number of samples in dataframe
- property numerical_features: List[Hashable]#
 Return list of numerical feature names.
- Returns
 - t.List[Hashable]
 List of numerical feature names.
- sample(n_samples: Optional[int], replace: bool = False, random_state: Optional[int] = None, drop_na_label: bool = False) TDataset[source]#
 Create a copy of the dataset object, with the internal dataframe being a sample of the original dataframe.
- Parameters
 - n_samplest.Optional[int]
 Number of samples to draw.
- replacebool, default: False
 Whether to sample with replacement.
- random_statet.Optional[int] , default None
 Random state.
- drop_na_labelbool, default: False
 Whether to take sample only from rows with exiting label.
- Returns
 - Dataset
 instance of the Dataset with sampled internal dataframe.
- select(columns: Optional[Union[Hashable, List[Hashable]]] = None, ignore_columns: Optional[Union[Hashable, List[Hashable]]] = None, keep_label: bool = False) TDataset[source]#
 Filter dataset columns by given params.
- Parameters
 - columnsUnion[Hashable, List[Hashable], None]
 Column names to keep.
- ignore_columnsUnion[Hashable, List[Hashable], None]
 Column names to drop.
- Returns
 - TDataset
 horizontally filtered dataset
- Raises
 - DeepchecksValueError
 In case one of columns given don’t exists raise error
- train_test_split(train_size: Optional[Union[int, float]] = None, test_size: Union[int, float] = 0.25, random_state: int = 42, shuffle: bool = True, stratify: Union[List, Series, ndarray, bool] = False) Tuple[TDataset, TDataset][source]#
 Split dataset into random train and test datasets.
- Parameters
 - train_sizet.Union[int, float, None] , default: None
 If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size.
- test_sizet.Union[int, float] , default: 0.25
 If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples.
- random_stateint , default: 42
 The random state to use for shuffling.
- shufflebool , default: True
 Whether to shuffle the data before splitting.
- stratifyt.Union[t.List, pd.Series, np.ndarray, bool] , default: False
 If True, data is split in a stratified fashion, using the class labels. If array-like, data is split in a stratified fashion, using this as class labels.
- Returns
 - ——-
 - Dataset
 Dataset containing train split data.
- Dataset
 Dataset containing test split data.
- class Context[source]#
 Contains all the data + properties the user has passed to a check/suite, and validates it seamlessly.
- Parameters
 - train: Union[Dataset, pd.DataFrame, None] , default: None
 Dataset or DataFrame object, representing data an estimator was fitted on
- test: Union[Dataset, pd.DataFrame, None] , default: None
 Dataset or DataFrame object, representing data an estimator predicts on
- model: Optional[BasicModel] , default: None
 A scikit-learn-compatible fitted estimator instance
- feature_importance: pd.Series , default: None
 pass manual features importance
- feature_importance_force_permutationbool , default: False
 force calculation of permutation features importance
- feature_importance_timeoutint , default: 120
 timeout in second for the permutation features importance calculation
- y_pred_train: Optional[np.ndarray] , default: None
 Array of the model prediction over the train dataset.
- y_pred_test: Optional[np.ndarray] , default: None
 Array of the model prediction over the test dataset.
- y_proba_train: Optional[np.ndarray] , default: None
 Array of the model prediction probabilities over the train dataset.
- y_proba_test: Optional[np.ndarray] , default: None
 Array of the model prediction probabilities over the test dataset.
- model_classes: Optional[List] , default: None
 For classification: list of classes known to the model
- Attributes
 feature_importanceReturn feature importance, or None if not possible.
feature_importance_typeReturn feature importance type if feature importance is available, else None.
modelReturn & validate model if model exists, otherwise raise error.
model_classesReturn ordered list of possible label classes for classification tasks or None for regression.
model_nameReturn model name.
observed_classesReturn the observed classes in both train and test.
task_typeReturn task type based on calculated classes argument.
testReturn test if exists, otherwise raise error.
trainReturn train if exists, otherwise raise error.
with_displayReturn the with_display flag.
Methods
Assert the task_type is classification.
Assert the task type is regression.
finalize_check_result(check_result, check[, ...])Run final processing on a check result which includes validation, conditions processing and sampling footnote.
get_data_by_kind(kind)Return the relevant Dataset by given kind.
get_scorers([scorers, use_avg_defaults])Return initialized & validated scorers in a given priority.
get_single_scorer([scorers, use_avg_defaults])Return initialized & validated single scorer in a given priority.
Return whether there is test dataset defined.
- __init__(train: Optional[Union[Dataset, DataFrame]] = None, test: Optional[Union[Dataset, DataFrame]] = None, model: Optional[BasicModel] = None, feature_importance: Optional[Series] = None, feature_importance_force_permutation: bool = False, feature_importance_timeout: int = 120, with_display: bool = True, y_pred_train: Optional[ndarray] = None, y_pred_test: Optional[ndarray] = None, y_proba_train: Optional[ndarray] = None, y_proba_test: Optional[ndarray] = None, model_classes: Optional[List] = None)[source]#
 
- property feature_importance: Optional[pandas.core.series.Series]#
 Return feature importance, or None if not possible.
- property feature_importance_type: Optional[str]#
 Return feature importance type if feature importance is available, else None.
- finalize_check_result(check_result, check, kind: Optional[DatasetKind] = None)[source]#
 Run final processing on a check result which includes validation, conditions processing and sampling footnote.
- get_scorers(scorers: Optional[Union[Mapping[str, Union[str, Callable]], List[str]]] = None, use_avg_defaults=True) List[DeepcheckScorer][source]#
 Return initialized & validated scorers in a given priority.
If receive scorers use them, Else if user defined global scorers use them, Else use default scorers.
- Parameters
 - scorersUnion[List[str], Dict[str, Union[str, Callable]]], default: None
 List of scorers to use. If None, use default scorers. Scorers can be supplied as a list of scorer names or as a dictionary of names and functions.
- use_avg_defaultsbool, default True
 If no scorers were provided, for classification, determines whether to use default scorers that return an averaged metric, or default scorers that return a metric per class.
- Returns
 - ——-
 - List[DeepcheckScorer]
 A list of initialized & validated scorers.
- get_single_scorer(scorers: Optional[Mapping[str, Union[str, Callable]]] = None, use_avg_defaults=True) DeepcheckScorer[source]#
 Return initialized & validated single scorer in a given priority.
If receive scorers use them, Else if user defined global scorers use them, Else use default scorers. Returns the first scorer from the scorers described above.
- Parameters
 - scorersUnion[List[str], Dict[str, Union[str, Callable]]], default: None
 List of scorers to use. If None, use default scorers. Scorers can be supplied as a list of scorer names or as a dictionary of names and functions.
- use_avg_defaultsbool, default True
 If no scorers were provided, for classification, determines whether to use default scorers that return an averaged metric, or default scorers that return a metric per class.
- Returns
 - ——-
 - List[DeepcheckScorer]
 An initialized & validated scorer.
- property model: BasicModel#
 Return & validate model if model exists, otherwise raise error.
- property model_classes: List#
 Return ordered list of possible label classes for classification tasks or None for regression.
- property model_name#
 Return model name.
- property observed_classes: List#
 Return the observed classes in both train and test. None for regression.
- property task_type: deepchecks.tabular.utils.task_type.TaskType#
 Return task type based on calculated classes argument.
- property with_display: bool#
 Return the with_display flag.
- class Suite[source]#
 Tabular suite to run checks of types: TrainTestCheck, SingleDatasetCheck, ModelOnlyCheck.
Methods
add(check)Add a check or a suite to current suite.
config()Return suite configuration (checks' conditions' configuration not yet supported).
from_config(conf[, version_unmatch])Return suite object from a CheckConfig object.
from_json(conf[, version_unmatch])Deserialize suite instance from JSON string.
remove(index)Remove a check by given index.
run([train_dataset, test_dataset, model, ...])Run all checks.
Return tuple of supported check types of this suite.
to_json([indent])Serialize suite instance to JSON string.
- add(check: Union[BaseCheck, BaseSuite])[source]#
 Add a check or a suite to current suite.
- Parameters
 - checkBaseCheck
 A check or suite to add.
- config() SuiteConfig[source]#
 Return suite configuration (checks’ conditions’ configuration not yet supported).
- Returns
 - SuiteConfig
 includes the suite name, and list of check configs.
- classmethod from_config(conf: SuiteConfig, version_unmatch: Optional[Union[Literal['raise'], Literal['warn']]] = 'warn') Self[source]#
 Return suite object from a CheckConfig object.
- Parameters
 - confSuiteConfig
 the SuiteConfig object
- Returns
 - BaseSuite
 the suite class object from given config
- from_json(conf: str, version_unmatch: Optional[Union[Literal['raise'], Literal['warn']]] = 'warn') Self[source]#
 Deserialize suite instance from JSON string.
- remove(index: int)[source]#
 Remove a check by given index.
- Parameters
 - indexint
 Index of check to remove.
- run(train_dataset: Optional[Union[Dataset, DataFrame]] = None, test_dataset: Optional[Union[Dataset, DataFrame]] = None, model: Optional[BasicModel] = None, feature_importance: Optional[Series] = None, feature_importance_force_permutation: bool = False, feature_importance_timeout: int = 120, with_display: bool = True, y_pred_train: Optional[ndarray] = None, y_pred_test: Optional[ndarray] = None, y_proba_train: Optional[ndarray] = None, y_proba_test: Optional[ndarray] = None, run_single_dataset: Optional[str] = None) SuiteResult[source]#
 Run all checks.
- Parameters
 - train_dataset: Optional[Union[Dataset, pd.DataFrame]] , default None
 object, representing data an estimator was fitted on
- test_datasetOptional[Union[Dataset, pd.DataFrame]] , default None
 object, representing data an estimator predicts on
- modelOptional[BasicModel] , default None
 A scikit-learn-compatible fitted estimator instance
- run_single_dataset: Optional[str], default None
 ‘Train’, ‘Test’ , or None to run on both train and test.
- feature_importance: pd.Series , default: None
 pass manual features importance
- feature_importance_force_permutationbool , default: False
 force calculation of permutation features importance
- feature_importance_timeoutint , default: 120
 timeout in second for the permutation features importance calculation
- y_pred_train: Optional[np.ndarray] , default: None
 Array of the model prediction over the train dataset.
- y_pred_test: Optional[np.ndarray] , default: None
 Array of the model prediction over the test dataset.
- y_proba_train: Optional[np.ndarray] , default: None
 Array of the model prediction probabilities over the train dataset.
- y_proba_test: Optional[np.ndarray] , default: None
 Array of the model prediction probabilities over the test dataset.
- model_classes: Optional[List] , default: None
 For classification: list of classes known to the model
- Returns
 - SuiteResult
 All results by all initialized checks
- class SingleDatasetCheck[source]#
 Parent class for checks that only use one dataset.
Methods
add_condition(name, condition_func, **params)Add new condition function to the check.
Remove all conditions from this check instance.
conditions_decision(result)Run conditions on given result.
config([include_version])Return check configuration (conditions' configuration not yet supported).
alias of
Contextfrom_config(conf[, version_unmatch])Return check object from a CheckConfig object.
from_json(conf[, version_unmatch])Deserialize check instance from JSON string.
metadata([with_doc_link])Return check metadata.
name()Name of class in split camel case.
params([show_defaults])Return parameters to show when printing the check.
remove_condition(index)Remove given condition by index.
run(dataset[, model, feature_importance, ...])Run check.
run_logic(context, dataset_kind)Run check.
to_json([indent])Serialize check instance to JSON string.
- add_condition(name: str, condition_func: Callable[[Any], Union[ConditionResult, bool]], **params)[source]#
 Add new condition function to the check.
- Parameters
 - namestr
 Name of the condition. should explain the condition action and parameters
- condition_funcCallable[[Any], Union[List[ConditionResult], bool]]
 Function which gets the value of the check and returns object of List[ConditionResult] or boolean.
- paramsdict
 Additional parameters to pass when calling the condition function.
- conditions_decision(result: CheckResult) List[ConditionResult][source]#
 Run conditions on given result.
- config(include_version: bool = True) CheckConfig[source]#
 Return check configuration (conditions’ configuration not yet supported).
- Returns
 - CheckConfig
 includes the checks class name, params, and module name.
- classmethod from_config(conf: CheckConfig, version_unmatch: Optional[Union[Literal['raise'], Literal['warn']]] = 'warn') Self[source]#
 Return check object from a CheckConfig object.
- Parameters
 - confDict[Any, Any]
 
- Returns
 - BaseCheck
 the check class object from given config
- from_json(conf: str, version_unmatch: Optional[Union[Literal['raise'], Literal['warn']]] = 'warn') Self[source]#
 Deserialize check instance from JSON string.
- metadata(with_doc_link: bool = False) CheckMetadata[source]#
 Return check metadata.
- Parameters
 - with_doc_linkbool, default False
 whethere to include doc link in summary or not
- Returns
 - Dict[str, Any]
 
- params(show_defaults: bool = False) Dict[source]#
 Return parameters to show when printing the check.
- remove_condition(index: int)[source]#
 Remove given condition by index.
- Parameters
 - indexint
 index of condtion to remove
- run(dataset: Union[Dataset, DataFrame], model: Optional[BasicModel] = None, feature_importance: Optional[Series] = None, feature_importance_force_permutation: bool = False, feature_importance_timeout: int = 120, with_display: bool = True, y_pred: Optional[ndarray] = None, y_proba: Optional[ndarray] = None, y_pred_train: Optional[ndarray] = None, y_pred_test: Optional[ndarray] = None, y_proba_train: Optional[ndarray] = None, y_proba_test: Optional[ndarray] = None, model_classes: Optional[List] = None) CheckResult[source]#
 Run check.
- Parameters
 - dataset: Union[Dataset, pd.DataFrame]
 Dataset or DataFrame object, representing data an estimator was fitted on
- model: Optional[BasicModel], default: None
 A scikit-learn-compatible fitted estimator instance
- feature_importance: pd.Series , default: None
 pass manual features importance
- feature_importance_force_permutationbool , default: False
 force calculation of permutation features importance
- feature_importance_timeoutint , default: 120
 timeout in second for the permutation features importance calculation
- y_pred_train: Optional[np.ndarray] , default: None
 Array of the model prediction over the train dataset.
- y_pred_test: Optional[np.ndarray] , default: None
 Array of the model prediction over the test dataset.
- y_proba_train: Optional[np.ndarray] , default: None
 Array of the model prediction probabilities over the train dataset.
- y_proba_test: Optional[np.ndarray] , default: None
 Array of the model prediction probabilities over the test dataset.
- model_classes: Optional[List] , default: None
 For classification: list of classes known to the model
- abstract run_logic(context, dataset_kind) CheckResult[source]#
 Run check.
- class TrainTestCheck[source]#
 Parent class for checks that compare two datasets.
The class checks train dataset and test dataset for model training and test.
Methods
add_condition(name, condition_func, **params)Add new condition function to the check.
Remove all conditions from this check instance.
conditions_decision(result)Run conditions on given result.
config([include_version])Return check configuration (conditions' configuration not yet supported).
alias of
Contextfrom_config(conf[, version_unmatch])Return check object from a CheckConfig object.
from_json(conf[, version_unmatch])Deserialize check instance from JSON string.
metadata([with_doc_link])Return check metadata.
name()Name of class in split camel case.
params([show_defaults])Return parameters to show when printing the check.
remove_condition(index)Remove given condition by index.
run(train_dataset, test_dataset[, model, ...])Run check.
run_logic(context)Run check.
to_json([indent])Serialize check instance to JSON string.
- add_condition(name: str, condition_func: Callable[[Any], Union[ConditionResult, bool]], **params)[source]#
 Add new condition function to the check.
- Parameters
 - namestr
 Name of the condition. should explain the condition action and parameters
- condition_funcCallable[[Any], Union[List[ConditionResult], bool]]
 Function which gets the value of the check and returns object of List[ConditionResult] or boolean.
- paramsdict
 Additional parameters to pass when calling the condition function.
- conditions_decision(result: CheckResult) List[ConditionResult][source]#
 Run conditions on given result.
- config(include_version: bool = True) CheckConfig[source]#
 Return check configuration (conditions’ configuration not yet supported).
- Returns
 - CheckConfig
 includes the checks class name, params, and module name.
- classmethod from_config(conf: CheckConfig, version_unmatch: Optional[Union[Literal['raise'], Literal['warn']]] = 'warn') Self[source]#
 Return check object from a CheckConfig object.
- Parameters
 - confDict[Any, Any]
 
- Returns
 - BaseCheck
 the check class object from given config
- from_json(conf: str, version_unmatch: Optional[Union[Literal['raise'], Literal['warn']]] = 'warn') Self[source]#
 Deserialize check instance from JSON string.
- metadata(with_doc_link: bool = False) CheckMetadata[source]#
 Return check metadata.
- Parameters
 - with_doc_linkbool, default False
 whethere to include doc link in summary or not
- Returns
 - Dict[str, Any]
 
- params(show_defaults: bool = False) Dict[source]#
 Return parameters to show when printing the check.
- remove_condition(index: int)[source]#
 Remove given condition by index.
- Parameters
 - indexint
 index of condtion to remove
- run(train_dataset: Union[Dataset, DataFrame], test_dataset: Union[Dataset, DataFrame], model: Optional[BasicModel] = None, feature_importance: Optional[Series] = None, feature_importance_force_permutation: bool = False, feature_importance_timeout: int = 120, with_display: bool = True, y_pred_train: Optional[ndarray] = None, y_pred_test: Optional[ndarray] = None, y_proba_train: Optional[ndarray] = None, y_proba_test: Optional[ndarray] = None, model_classes: Optional[List] = None) CheckResult[source]#
 Run check.
- Parameters
 - train_dataset: Union[Dataset, pd.DataFrame]
 Dataset or DataFrame object, representing data an estimator was fitted on
- test_dataset: Union[Dataset, pd.DataFrame]
 Dataset or DataFrame object, representing data an estimator predicts on
- model: Optional[BasicModel], default: None
 A scikit-learn-compatible fitted estimator instance
- feature_importance: pd.Series , default: None
 pass manual features importance
- feature_importance_force_permutationbool , default: False
 force calculation of permutation features importance
- feature_importance_timeoutint , default: 120
 timeout in second for the permutation features importance calculation
- y_pred_train: Optional[np.ndarray] , default: None
 Array of the model prediction over the train dataset.
- y_pred_test: Optional[np.ndarray] , default: None
 Array of the model prediction over the test dataset.
- y_proba_train: Optional[np.ndarray] , default: None
 Array of the model prediction probabilities over the train dataset.
- y_proba_test: Optional[np.ndarray] , default: None
 Array of the model prediction probabilities over the test dataset.
- model_classes: Optional[List] , default: None
 For classification: list of classes known to the model
- abstract run_logic(context) CheckResult[source]#
 Run check.
- class ModelOnlyCheck[source]#
 Parent class for checks that only use a model and no datasets.
Methods
add_condition(name, condition_func, **params)Add new condition function to the check.
Remove all conditions from this check instance.
conditions_decision(result)Run conditions on given result.
config([include_version])Return check configuration (conditions' configuration not yet supported).
alias of
Contextfrom_config(conf[, version_unmatch])Return check object from a CheckConfig object.
from_json(conf[, version_unmatch])Deserialize check instance from JSON string.
metadata([with_doc_link])Return check metadata.
name()Name of class in split camel case.
params([show_defaults])Return parameters to show when printing the check.
remove_condition(index)Remove given condition by index.
run(model[, feature_importance, ...])Run check.
run_logic(context)Run check.
to_json([indent])Serialize check instance to JSON string.
- add_condition(name: str, condition_func: Callable[[Any], Union[ConditionResult, bool]], **params)[source]#
 Add new condition function to the check.
- Parameters
 - namestr
 Name of the condition. should explain the condition action and parameters
- condition_funcCallable[[Any], Union[List[ConditionResult], bool]]
 Function which gets the value of the check and returns object of List[ConditionResult] or boolean.
- paramsdict
 Additional parameters to pass when calling the condition function.
- conditions_decision(result: CheckResult) List[ConditionResult][source]#
 Run conditions on given result.
- config(include_version: bool = True) CheckConfig[source]#
 Return check configuration (conditions’ configuration not yet supported).
- Returns
 - CheckConfig
 includes the checks class name, params, and module name.
- classmethod from_config(conf: CheckConfig, version_unmatch: Optional[Union[Literal['raise'], Literal['warn']]] = 'warn') Self[source]#
 Return check object from a CheckConfig object.
- Parameters
 - confDict[Any, Any]
 
- Returns
 - BaseCheck
 the check class object from given config
- from_json(conf: str, version_unmatch: Optional[Union[Literal['raise'], Literal['warn']]] = 'warn') Self[source]#
 Deserialize check instance from JSON string.
- metadata(with_doc_link: bool = False) CheckMetadata[source]#
 Return check metadata.
- Parameters
 - with_doc_linkbool, default False
 whethere to include doc link in summary or not
- Returns
 - Dict[str, Any]
 
- params(show_defaults: bool = False) Dict[source]#
 Return parameters to show when printing the check.
- remove_condition(index: int)[source]#
 Remove given condition by index.
- Parameters
 - indexint
 index of condtion to remove
- run(model: BasicModel, feature_importance: Optional[Series] = None, feature_importance_force_permutation: bool = False, feature_importance_timeout: int = 120, with_display: bool = True, y_pred_train: Optional[ndarray] = None, y_pred_test: Optional[ndarray] = None, y_proba_train: Optional[ndarray] = None, y_proba_test: Optional[ndarray] = None) CheckResult[source]#
 Run check.
- Parameters
 - model: BasicModel
 A scikit-learn-compatible fitted estimator instance
- feature_importance: pd.Series , default: None
 pass manual features importance
- feature_importance_force_permutationbool , default: False
 force calculation of permutation features importance
- feature_importance_timeoutint , default: 120
 timeout in second for the permutation features importance calculation
- y_pred_train: Optional[np.ndarray] , default: None
 Array of the model prediction over the train dataset.
- y_pred_test: Optional[np.ndarray] , default: None
 Array of the model prediction over the test dataset.
- y_proba_train: Optional[np.ndarray] , default: None
 Array of the model prediction probabilities over the train dataset.
- y_proba_test: Optional[np.ndarray] , default: None
 Array of the model prediction probabilities over the test dataset.
- model_classes: Optional[List] , default: None
 For classification: list of classes known to the model
- abstract run_logic(context) CheckResult[source]#
 Run check.
- class ModelComparisonContext[source]#
 Contain processed input for model comparison checks.
- Attributes
 modelsReturn the models’ dict.
Methods
finalize_check_result(check_result, check)Run final processing on a check result which includes validation and conditions processing.
- __init__(train_datasets: Union[Dataset, List[Dataset]], test_datasets: Union[Dataset, List[Dataset]], models: Union[List[Any], Mapping[str, Any]])[source]#
 Preprocess the parameters.
- finalize_check_result(check_result, check)[source]#
 Run final processing on a check result which includes validation and conditions processing.
- property models: Dict#
 Return the models’ dict.
- class ModelComparisonCheck[source]#
 Parent class for check that compares between two or more models.
Methods
add_condition(name, condition_func, **params)Add new condition function to the check.
Remove all conditions from this check instance.
conditions_decision(result)Run conditions on given result.
config([include_version])Return check configuration (conditions' configuration not yet supported).
from_config(conf[, version_unmatch])Return check object from a CheckConfig object.
from_json(conf[, version_unmatch])Deserialize check instance from JSON string.
metadata([with_doc_link])Return check metadata.
name()Name of class in split camel case.
params([show_defaults])Return parameters to show when printing the check.
remove_condition(index)Remove given condition by index.
run(train_datasets, test_datasets, models)Initialize context and pass to check logic.
run_logic(multi_context)Implement here logic of check.
to_json([indent])Serialize check instance to JSON string.
- add_condition(name: str, condition_func: Callable[[Any], Union[ConditionResult, bool]], **params)[source]#
 Add new condition function to the check.
- Parameters
 - namestr
 Name of the condition. should explain the condition action and parameters
- condition_funcCallable[[Any], Union[List[ConditionResult], bool]]
 Function which gets the value of the check and returns object of List[ConditionResult] or boolean.
- paramsdict
 Additional parameters to pass when calling the condition function.
- conditions_decision(result: CheckResult) List[ConditionResult][source]#
 Run conditions on given result.
- config(include_version: bool = True) CheckConfig[source]#
 Return check configuration (conditions’ configuration not yet supported).
- Returns
 - CheckConfig
 includes the checks class name, params, and module name.
- classmethod from_config(conf: CheckConfig, version_unmatch: Optional[Union[Literal['raise'], Literal['warn']]] = 'warn') Self[source]#
 Return check object from a CheckConfig object.
- Parameters
 - confDict[Any, Any]
 
- Returns
 - BaseCheck
 the check class object from given config
- from_json(conf: str, version_unmatch: Optional[Union[Literal['raise'], Literal['warn']]] = 'warn') Self[source]#
 Deserialize check instance from JSON string.
- metadata(with_doc_link: bool = False) CheckMetadata[source]#
 Return check metadata.
- Parameters
 - with_doc_linkbool, default False
 whethere to include doc link in summary or not
- Returns
 - Dict[str, Any]
 
- params(show_defaults: bool = False) Dict[source]#
 Return parameters to show when printing the check.
- remove_condition(index: int)[source]#
 Remove given condition by index.
- Parameters
 - indexint
 index of condtion to remove
- run(train_datasets: Union[Dataset, List[Dataset]], test_datasets: Union[Dataset, List[Dataset]], models: Union[List[BasicModel], Mapping[str, BasicModel]]) CheckResult[source]#
 Initialize context and pass to check logic.
- Parameters
 - train_datasets: Union[Dataset, List[Dataset]]
 train datasets
- test_datasets: Union[Dataset, List[Dataset]]
 test datasets
- models: Union[List[BasicModel], Mapping[str, BasicModel]]
 list or map of models
- abstract run_logic(multi_context: ModelComparisonContext) CheckResult[source]#
 Implement here logic of check.
- class ModelComparisonSuite[source]#
 Suite to run checks of types: CompareModelsBaseCheck.
Methods
add(check)Add a check or a suite to current suite.
config()Return suite configuration (checks' conditions' configuration not yet supported).
from_config(conf[, version_unmatch])Return suite object from a CheckConfig object.
from_json(conf[, version_unmatch])Deserialize suite instance from JSON string.
remove(index)Remove a check by given index.
run(train_datasets, test_datasets, models)Run all checks.
Return tuple of supported check types of this suite.
to_json([indent])Serialize suite instance to JSON string.
- add(check: Union[BaseCheck, BaseSuite])[source]#
 Add a check or a suite to current suite.
- Parameters
 - checkBaseCheck
 A check or suite to add.
- config() SuiteConfig[source]#
 Return suite configuration (checks’ conditions’ configuration not yet supported).
- Returns
 - SuiteConfig
 includes the suite name, and list of check configs.
- classmethod from_config(conf: SuiteConfig, version_unmatch: Optional[Union[Literal['raise'], Literal['warn']]] = 'warn') Self[source]#
 Return suite object from a CheckConfig object.
- Parameters
 - confSuiteConfig
 the SuiteConfig object
- Returns
 - BaseSuite
 the suite class object from given config
- from_json(conf: str, version_unmatch: Optional[Union[Literal['raise'], Literal['warn']]] = 'warn') Self[source]#
 Deserialize suite instance from JSON string.
- remove(index: int)[source]#
 Remove a check by given index.
- Parameters
 - indexint
 Index of check to remove.
- run(train_datasets: Union[Dataset, List[Dataset]], test_datasets: Union[Dataset, List[Dataset]], models: Union[List[Any], Mapping[str, Any]]) SuiteResult[source]#
 Run all checks.
- Parameters
 - train_datasetsUnion[Dataset, Container[Dataset]]
 representing data an estimator was fitted on
- test_datasets: Union[Dataset, Container[Dataset]]
 representing data an estimator was fitted on
- modelsUnion[Container[Any], Mapping[str, Any]]
 2 or more scikit-learn-compatible fitted estimator instance
- Returns
 - ——-
 - SuiteResult
 All results by all initialized checks
- Raises
 - ——
 - ValueError
 if check_datasets_policy is not of allowed types