deepchecks.tabular#
Package for tabular functionality.
Modules
Module importing all tabular checks. |
|
Module contains all prebuilt suites. |
|
Module for working with pre-built datasets. |
|
Module containing the integrations of the deepchecks.tabular package with external packages. |
|
Module containing metrics utils. |
|
Package for tabular utilities routines. |
Classes
- class Dataset[source]#
Dataset wraps pandas DataFrame together with ML related metadata.
The Dataset class is containing additional data and methods intended for easily accessing metadata relevant for the training or validating of an ML models.
- Parameters
- dfAny
- An object that can be casted to a pandas DataFrame
containing data relevant for the training or validating of a ML models.
- labelt.Union[Hashable, pd.Series, pd.DataFrame, np.ndarray] , default: None
label column provided either as a string with the name of an existing column in the DataFrame or a label object including the label data (pandas Series/DataFrame or a numpy array) that will be concatenated to the data in the DataFrame. in case of label data the following logic is applied to set the label name:
Series: takes the series name or ‘target’ if name is empty
DataFrame: expect single column in the dataframe and use its name
numpy: use ‘target’
- featurest.Optional[t.Sequence[Hashable]] , default: None
List of names for the feature columns in the DataFrame.
- cat_featurest.Optional[t.Sequence[Hashable]] , default: None
List of names for the categorical features in the DataFrame. In order to disable categorical. features inference, pass cat_features=[]
- index_namet.Optional[Hashable] , default: None
Name of the index column in the dataframe. If set_index_from_dataframe_index is True and index_name is not None, index will be created from the dataframe index level with the given name. If index levels have no names, an int must be used to select the appropriate level by order.
- set_index_from_dataframe_indexbool , default: False
If set to true, index will be created from the dataframe index instead of dataframe columns (default). If index_name is None, first level of the index will be used in case of a multilevel index.
- datetime_namet.Optional[Hashable] , default: None
Name of the datetime column in the dataframe. If set_datetime_from_dataframe_index is True and datetime_name is not None, date will be created from the dataframe index level with the given name. If index levels have no names, an int must be used to select the appropriate level by order.
- set_datetime_from_dataframe_indexbool , default: False
If set to true, date will be created from the dataframe index instead of dataframe columns (default). If datetime_name is None, first level of the index will be used in case of a multilevel index.
- convert_datetimebool , default: True
If set to true, date will be converted to datetime using pandas.to_datetime.
- datetime_argst.Optional[t.Dict] , default: None
pandas.to_datetime args used for conversion of the datetime column. (look at https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html for more documentation)
- max_categorical_ratiofloat , default: 0.01
The max ratio of unique values in a column in order for it to be inferred as a categorical feature.
- max_categoriesint , default: None
The maximum number of categories in a column in order for it to be inferred as a categorical feature. if None, uses is_categorical default inference mechanism.
- label_typestr , default: None
Used to determine the task type. If None, inferred when running a check based on label column and model. Possible values are: ‘multiclass’, ‘binary’ and ‘regression’.
- Attributes
cat_features
Return list of categorical feature names.
classes_in_label_col
Return the classes from label column in sorted list.
columns_info
Return the role and logical type of each column.
data
Return the data of dataset.
datetime_col
Return datetime column if exists.
datetime_name
If datetime column exists, return its name.
features
Return list of feature names.
features_columns
Return DataFrame containing only the features defined in the dataset, if features are empty raise error.
index_col
Return index column.
index_name
If index column exists, return its name.
label_col
Return Series of the label defined in the dataset, if label is not defined raise error.
label_name
If label column exists, return its name.
label_type
Return the label type.
n_samples
Return number of samples in dataframe.
numerical_features
Return list of numerical feature names.
Methods
Check if datetime is defined and if not raise error.
Check if features are defined (not empty) and if not raise error.
Check if index is defined and if not raise error.
cast_to_dataset
(obj)Verify Dataset or transform to Dataset.
copy
(new_data)Create a copy of this Dataset with new data.
datasets_share_categorical_features
(*datasets)Verify that all provided datasets share same categorical features.
datasets_share_date
(*datasets)Verify that all provided datasets share same date column.
datasets_share_features
(*datasets)Verify that all provided datasets share same features.
datasets_share_index
(*datasets)Verify that all provided datasets share same index column.
datasets_share_label
(*datasets)Verify that all provided datasets share same label column.
Create a copy of the dataset object without samples with missing labels.
from_numpy
(*args[, columns, label_name])Create Dataset instance from numpy arrays.
get_datetime_column_from_index
(datetime_name)Retrieve the datetime info from the index if _set_datetime_from_dataframe_index is True.
Return True if label column exists.
is_categorical
(col_name)Check if a column is considered a category column in the dataset object.
is_sampled
(n_samples)Return True if the dataset number of samples will decrease when sampled with n_samples samples.
len_when_sampled
(n_samples)Return number of samples in the sampled dataframe this dataset is sampled with n_samples samples.
sample
([n_samples, replace, random_state])Create a copy of the dataset object, with the internal dataframe being a sample of the original dataframe.
select
([columns, ignore_columns, keep_label])Filter dataset columns by given params.
train_test_split
([train_size, test_size, ...])Split dataset into random train and test datasets.
- __init__(df: Any, label: Optional[Union[Hashable, Series, DataFrame, ndarray]] = None, features: Optional[Sequence[Hashable]] = None, cat_features: Optional[Sequence[Hashable]] = None, index_name: Optional[Hashable] = None, set_index_from_dataframe_index: bool = False, datetime_name: Optional[Hashable] = None, set_datetime_from_dataframe_index: bool = False, convert_datetime: bool = True, datetime_args: Optional[Dict] = None, max_categorical_ratio: float = 0.01, max_categories: Optional[int] = None, label_type: Optional[str] = None, dataset_name: Optional[str] = None, label_classes=None)[source]#
- assert_datetime()[source]#
Check if datetime is defined and if not raise error.
- Raises
- DeepchecksNotSupportedError
- assert_features()[source]#
Check if features are defined (not empty) and if not raise error.
- Raises
- DeepchecksNotSupportedError
- assert_index()[source]#
Check if index is defined and if not raise error.
- Raises
- DeepchecksNotSupportedError
- classmethod cast_to_dataset(obj: Any) Dataset [source]#
Verify Dataset or transform to Dataset.
Function verifies that provided value is a non-empty instance of Dataset, otherwise raises an exception, but if the ‘cast’ flag is set to True it will also try to transform provided value to the Dataset instance.
- Parameters
- obj
value to verify
- Raises
- DeepchecksValueError
if the provided value is not a Dataset instance; if the provided value cannot be transformed into Dataset instance;
- property cat_features: List[Hashable]#
Return list of categorical feature names.
- Returns
- t.List[Hashable]
List of categorical feature names.
- property classes_in_label_col: Tuple[str, ...]#
Return the classes from label column in sorted list. if no label column defined, return empty list.
- Returns
- t.Tuple[str, …]
Sorted classes
- property columns_info: Dict[Hashable, str]#
Return the role and logical type of each column.
- Returns
- t.Dict[Hashable, str]
Directory of a column and its role
- copy(new_data: DataFrame) TDataset [source]#
Create a copy of this Dataset with new data.
- Parameters
- new_data (DataFrame): new data from which new dataset will be created
- Returns
- Dataset
new dataset instance
Verify that all provided datasets share same categorical features.
- Parameters
- datasetsList[Dataset]
list of datasets to validate
- Returns
- bool
True if all datasets share same categorical features, otherwise False
- Raises
- AssertionError
‘datasets’ parameter is not a list; ‘datasets’ contains less than one dataset;
Verify that all provided datasets share same date column.
- Parameters
- datasetsList[Dataset]
list of datasets to validate
- Returns
- bool
True if all datasets share same date column, otherwise False
- Raises
- AssertionError
‘datasets’ parameter is not a list; ‘datasets’ contains less than one dataset;
Verify that all provided datasets share same features.
- Parameters
- datasetsList[Dataset]
list of datasets to validate
- Returns
- bool
True if all datasets share same features, otherwise False
- Raises
- AssertionError
‘datasets’ parameter is not a list; ‘datasets’ contains less than one dataset;
Verify that all provided datasets share same index column.
- Parameters
- datasetsList[Dataset]
list of datasets to validate
- Returns
- bool
True if all datasets share same index column, otherwise False
- Raises
- AssertionError
‘datasets’ parameter is not a list; ‘datasets’ contains less than one dataset;
Verify that all provided datasets share same label column.
- Parameters
- datasetsList[Dataset]
list of datasets to validate
- Returns
- bool
True if all datasets share same categorical features, otherwise False
- Raises
- AssertionError
‘datasets’ parameter is not a list; ‘datasets’ contains less than one dataset;
- property datetime_col: Optional[Series]#
Return datetime column if exists.
- Returns
- t.Optional[pd.Series]
Series of the datetime column
- property datetime_name: Optional[Hashable]#
If datetime column exists, return its name.
- Returns
- t.Optional[Hashable]
datetime name
- drop_na_labels() TDataset [source]#
Create a copy of the dataset object without samples with missing labels.
- property features: List[Hashable]#
Return list of feature names.
- Returns
- t.List[Hashable]
List of feature names.
- property features_columns: DataFrame#
Return DataFrame containing only the features defined in the dataset, if features are empty raise error.
- Returns
- pd.DataFrame
- classmethod from_numpy(*args: ndarray, columns: Optional[Sequence[Hashable]] = None, label_name: Optional[Hashable] = None, **kwargs) TDataset [source]#
Create Dataset instance from numpy arrays.
- Parameters
- *args: np.ndarray
Numpy array of data columns, and second optional numpy array of labels.
- columnst.Sequence[Hashable] , default: None
names for the columns. If none provided, the names that will be automatically assigned to the columns will be: 1 - n (where n - number of columns)
- label_namet.Hashable , default: None
labels column name. If none is provided, the name ‘target’ will be used.
- **kwargsDict
additional arguments that will be passed to the main Dataset constructor.
- Returns
- ——-
- Dataset
instance of the Dataset
- Raises
- ——
- DeepchecksValueError
if receives zero or more than two numpy arrays. if columns (args[0]) is not two dimensional numpy array. if labels (args[1]) is not one dimensional numpy array. if features array or labels array is empty.
Examples
>>> import numpy >>> from deepchecks.tabular import Dataset
>>> features = numpy.array([[0.25, 0.3, 0.3], ... [0.14, 0.75, 0.3], ... [0.23, 0.39, 0.1]]) >>> labels = numpy.array([0.1, 0.1, 0.7]) >>> dataset = Dataset.from_numpy(features, labels)
Creating dataset only from features array.
>>> dataset = Dataset.from_numpy(features)
Passing additional arguments to the main Dataset constructor
>>> dataset = Dataset.from_numpy(features, labels, max_categorical_ratio=0.5)
Specifying features and label columns names.
>>> dataset = Dataset.from_numpy( ... features, labels, ... columns=['sensor-1', 'sensor-2', 'sensor-3'], ... label_name='labels' ... )
- get_datetime_column_from_index(datetime_name)[source]#
Retrieve the datetime info from the index if _set_datetime_from_dataframe_index is True.
- has_label() bool [source]#
Return True if label column exists.
- Returns
- bool
True if label column exists.
- property index_col: Optional[Series]#
Return index column. Index can be a named column or DataFrame index.
- Returns
- t.Optional[pd.Series]
If index column exists, returns a pandas Series of the index column.
- property index_name: Optional[Hashable]#
If index column exists, return its name.
- Returns
- t.Optional[Hashable]
index name
- is_categorical(col_name: Hashable) bool [source]#
Check if a column is considered a category column in the dataset object.
- Parameters
- col_nameHashable
The name of the column in the dataframe
- Returns
- bool
If is categorical according to input numbers
- is_sampled(n_samples: int)[source]#
Return True if the dataset number of samples will decrease when sampled with n_samples samples.
- property label_col: Series#
Return Series of the label defined in the dataset, if label is not defined raise error.
- Returns
- pd.Series
- property label_name: Optional[Hashable]#
If label column exists, return its name. Otherwise, throw an exception.
- Returns
- t.Optional[Hashable]
Label name
- property label_type: Optional[TaskType]#
Return the label type.
- Returns
- t.Optional[TaskType]
Label type
- len_when_sampled(n_samples: int)[source]#
Return number of samples in the sampled dataframe this dataset is sampled with n_samples samples.
- property n_samples: int#
Return number of samples in dataframe.
- Returns
- int
Number of samples in dataframe
- property numerical_features: List[Hashable]#
Return list of numerical feature names.
- Returns
- t.List[Hashable]
List of numerical feature names.
- sample(n_samples: Optional[int] = None, replace: bool = False, random_state: Optional[int] = None) TDataset [source]#
Create a copy of the dataset object, with the internal dataframe being a sample of the original dataframe.
- Parameters
- n_samplest.Optional[int]
Number of samples to draw.
- replacebool, default: False
Whether to sample with replacement.
- random_statet.Optional[int] , default None
Random state.
- Returns
- Dataset
instance of the Dataset with sampled internal dataframe.
- select(columns: Optional[Union[Hashable, List[Hashable]]] = None, ignore_columns: Optional[Union[Hashable, List[Hashable]]] = None, keep_label: bool = False) TDataset [source]#
Filter dataset columns by given params.
- Parameters
- columnsUnion[Hashable, List[Hashable], None]
Column names to keep.
- ignore_columnsUnion[Hashable, List[Hashable], None]
Column names to drop.
- Returns
- TDataset
horizontally filtered dataset
- Raises
- DeepchecksValueError
In case one of columns given don’t exists raise error
- train_test_split(train_size: Optional[Union[int, float]] = None, test_size: Union[int, float] = 0.25, random_state: int = 42, shuffle: bool = True, stratify: Union[List, Series, ndarray, bool] = False) Tuple[TDataset, TDataset] [source]#
Split dataset into random train and test datasets.
- Parameters
- train_sizet.Union[int, float, None] , default: None
If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size.
- test_sizet.Union[int, float] , default: 0.25
If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples.
- random_stateint , default: 42
The random state to use for shuffling.
- shufflebool , default: True
Whether to shuffle the data before splitting.
- stratifyt.Union[t.List, pd.Series, np.ndarray, bool] , default: False
If True, data is split in a stratified fashion, using the class labels. If array-like, data is split in a stratified fashion, using this as class labels.
- Returns
- ——-
- Dataset
Dataset containing train split data.
- Dataset
Dataset containing test split data.
- class Context[source]#
Contains all the data + properties the user has passed to a check/suite, and validates it seamlessly.
- Parameters
- train: Union[Dataset, pd.DataFrame, None] , default: None
Dataset or DataFrame object, representing data an estimator was fitted on
- test: Union[Dataset, pd.DataFrame, None] , default: None
Dataset or DataFrame object, representing data an estimator predicts on
- model: Optional[BasicModel] , default: None
A scikit-learn-compatible fitted estimator instance
- feature_importance: pd.Series , default: None
pass manual features importance
- feature_importance_force_permutationbool , default: False
force calculation of permutation features importance
- feature_importance_timeoutint , default: 120
timeout in second for the permutation features importance calculation
- y_pred_train: Optional[np.ndarray] , default: None
Array of the model prediction over the train dataset.
- y_pred_test: Optional[np.ndarray] , default: None
Array of the model prediction over the test dataset.
- y_proba_train: Optional[np.ndarray] , default: None
Array of the model prediction probabilities over the train dataset.
- y_proba_test: Optional[np.ndarray] , default: None
Array of the model prediction probabilities over the test dataset.
- model_classes: Optional[List] , default: None
For classification: list of classes known to the model
- Attributes
feature_importance
Return feature importance, or None if not possible.
feature_importance_timeout
Return feature importance timeout.
feature_importance_type
Return feature importance type if feature importance is available, else None.
model
Return & validate model if model exists, otherwise raise error.
model_classes
Return ordered list of possible label classes for classification tasks or None for regression.
model_name
Return model name.
observed_classes
Return the observed classes in both train and test.
task_type
Return task type based on calculated classes argument.
test
Return test if exists, otherwise raise error.
train
Return train if exists, otherwise raise error.
with_display
Return the with_display flag.
Methods
Assert the task_type is classification.
Assert the task type is regression.
assert_task_type
(*expected_types)Assert task_type matching given types.
finalize_check_result
(check_result, check[, ...])Run final processing on a check result which includes validation, conditions processing and sampling footnote.
get_data_by_kind
(kind)Return the relevant Dataset by given kind.
get_scorers
([scorers, use_avg_defaults])Return initialized & validated scorers if provided or default scorers otherwise.
get_single_scorer
([scorer, use_avg_defaults])Return initialized & validated scorer if provided or a default scorer otherwise.
Return whether there is test dataset defined.
- __init__(train: Optional[Union[Dataset, DataFrame]] = None, test: Optional[Union[Dataset, DataFrame]] = None, model: Optional[BasicModel] = None, feature_importance: Optional[Series] = None, feature_importance_force_permutation: bool = False, feature_importance_timeout: int = 120, with_display: bool = True, y_pred_train: Optional[ndarray] = None, y_pred_test: Optional[ndarray] = None, y_proba_train: Optional[ndarray] = None, y_proba_test: Optional[ndarray] = None, model_classes: Optional[List] = None)[source]#
- assert_task_type(*expected_types)[source]#
Assert task_type matching given types.
If task_type is defined, validate it and raise error if needed, else returns True. If task_type is not defined, return False.
- property feature_importance_type: Optional[str]#
Return feature importance type if feature importance is available, else None.
- finalize_check_result(check_result, check, dataset_kind: Optional[DatasetKind] = None)[source]#
Run final processing on a check result which includes validation, conditions processing and sampling footnote.
- get_data_by_kind(kind: DatasetKind)[source]#
Return the relevant Dataset by given kind.
- get_scorers(scorers: Optional[Union[Mapping[str, Union[str, Callable]], List[str]]] = None, use_avg_defaults=True) List[DeepcheckScorer] [source]#
Return initialized & validated scorers if provided or default scorers otherwise.
- Parameters
- scorersUnion[List[str], Dict[str, Union[str, Callable]]], default: None
List of scorers to use. If None, use default scorers. Scorers can be supplied as a list of scorer names or as a dictionary of names and functions.
- use_avg_defaultsbool, default True
If no scorers were provided, for classification, determines whether to use default scorers that return an averaged metric, or default scorers that return a metric per class.
- Returns
- ——-
- List[DeepcheckScorer]
A list of initialized & validated scorers.
- get_single_scorer(scorer: Optional[Mapping[str, Union[str, Callable]]] = None, use_avg_defaults=True) DeepcheckScorer [source]#
Return initialized & validated scorer if provided or a default scorer otherwise.
- Parameters
- scorerUnion[List[str], Dict[str, Union[str, Callable]]], default: None
List of scorers to use. If None, use default scorers. Scorers can be supplied as a list of scorer names or as a dictionary of names and functions.
- use_avg_defaultsbool, default True
If no scorers were provided, for classification, determines whether to use default scorers that return an averaged metric, or default scorers that return a metric per class.
- Returns
- ——-
- List[DeepcheckScorer]
An initialized & validated scorer.
- property model: BasicModel#
Return & validate model if model exists, otherwise raise error.
- property model_classes: List#
Return ordered list of possible label classes for classification tasks or None for regression.
- property model_name#
Return model name.
- property observed_classes: List#
Return the observed classes in both train and test. None for regression.
- property test#
Return test if exists, otherwise raise error.
- property train#
Return train if exists, otherwise raise error.
- class Suite[source]#
Tabular suite to run checks of types: TrainTestCheck, SingleDatasetCheck, ModelOnlyCheck.
Methods
add
(check)Add a check or a suite to current suite.
config
()Return suite configuration (checks' conditions' configuration not yet supported).
from_config
(conf[, version_unmatch])Return suite object from a CheckConfig object.
from_json
(conf[, version_unmatch])Deserialize suite instance from JSON string.
remove
(index)Remove a check by given index.
run
([train_dataset, test_dataset, model, ...])Run all checks.
Return tuple of supported check types of this suite.
to_json
([indent])Serialize suite instance to JSON string.
- add(check: Union[BaseCheck, BaseSuite])[source]#
Add a check or a suite to current suite.
- Parameters
- checkBaseCheck
A check or suite to add.
- config() SuiteConfig [source]#
Return suite configuration (checks’ conditions’ configuration not yet supported).
- Returns
- SuiteConfig
includes the suite name, and list of check configs.
- classmethod from_config(conf: SuiteConfig, version_unmatch: Optional[Union[Literal['raise'], Literal['warn']]] = 'warn') Self [source]#
Return suite object from a CheckConfig object.
- Parameters
- confSuiteConfig
the SuiteConfig object
- Returns
- BaseSuite
the suite class object from given config
- from_json(conf: str, version_unmatch: Optional[Union[Literal['raise'], Literal['warn']]] = 'warn') Self [source]#
Deserialize suite instance from JSON string.
- remove(index: int)[source]#
Remove a check by given index.
- Parameters
- indexint
Index of check to remove.
- run(train_dataset: Optional[Union[Dataset, DataFrame]] = None, test_dataset: Optional[Union[Dataset, DataFrame]] = None, model: Optional[BasicModel] = None, feature_importance: Optional[Series] = None, feature_importance_force_permutation: bool = False, feature_importance_timeout: int = 120, with_display: bool = True, y_pred_train: Optional[ndarray] = None, y_pred_test: Optional[ndarray] = None, y_proba_train: Optional[ndarray] = None, y_proba_test: Optional[ndarray] = None, run_single_dataset: Optional[str] = None, model_classes: Optional[List] = None) SuiteResult [source]#
Run all checks.
- Parameters
- train_dataset: Optional[Union[Dataset, pd.DataFrame]] , default None
object, representing data an estimator was fitted on
- test_datasetOptional[Union[Dataset, pd.DataFrame]] , default None
object, representing data an estimator predicts on
- modelOptional[BasicModel] , default None
A scikit-learn-compatible fitted estimator instance
- run_single_dataset: Optional[str], default None
‘Train’, ‘Test’ , or None to run on both train and test.
- feature_importance: pd.Series , default: None
pass manual features importance
- feature_importance_force_permutationbool , default: False
force calculation of permutation features importance
- feature_importance_timeoutint , default: 120
timeout in second for the permutation features importance calculation
- y_pred_train: Optional[np.ndarray] , default: None
Array of the model prediction over the train dataset.
- y_pred_test: Optional[np.ndarray] , default: None
Array of the model prediction over the test dataset.
- y_proba_train: Optional[np.ndarray] , default: None
Array of the model prediction probabilities over the train dataset.
- y_proba_test: Optional[np.ndarray] , default: None
Array of the model prediction probabilities over the test dataset.
- model_classes: Optional[List] , default: None
For classification: list of classes known to the model
- Returns
- SuiteResult
All results by all initialized checks
- class SingleDatasetCheck[source]#
Parent class for checks that only use one dataset.
Methods
add_condition
(name, condition_func, **params)Add new condition function to the check.
Remove all conditions from this check instance.
conditions_decision
(result)Run conditions on given result.
config
([include_version, include_defaults])Return check configuration (conditions' configuration not yet supported).
alias of
Context
from_config
(conf[, version_unmatch])Return check object from a CheckConfig object.
from_json
(conf[, version_unmatch])Deserialize check instance from JSON string.
metadata
([with_doc_link])Return check metadata.
name
()Name of class in split camel case.
params
([show_defaults])Return parameters to show when printing the check.
remove_condition
(index)Remove given condition by index.
run
(dataset[, model, feature_importance, ...])Run check.
run_logic
(context, dataset_kind)Run check.
to_json
([indent, include_version, ...])Serialize check instance to JSON string.
- add_condition(name: str, condition_func: Callable[[Any], Union[ConditionResult, bool]], **params)[source]#
Add new condition function to the check.
- Parameters
- namestr
Name of the condition. should explain the condition action and parameters
- condition_funcCallable[[Any], Union[List[ConditionResult], bool]]
Function which gets the value of the check and returns object of List[ConditionResult] or boolean.
- paramsdict
Additional parameters to pass when calling the condition function.
- conditions_decision(result: CheckResult) List[ConditionResult] [source]#
Run conditions on given result.
- config(include_version: bool = True, include_defaults: bool = True) CheckConfig [source]#
Return check configuration (conditions’ configuration not yet supported).
- Returns
- CheckConfig
includes the checks class name, params, and module name.
- classmethod from_config(conf: CheckConfig, version_unmatch: Optional[Union[Literal['raise'], Literal['warn']]] = 'warn') Self [source]#
Return check object from a CheckConfig object.
- Parameters
- confDict[Any, Any]
- Returns
- BaseCheck
the check class object from given config
- classmethod from_json(conf: str, version_unmatch: Optional[Union[Literal['raise'], Literal['warn']]] = 'warn') Self [source]#
Deserialize check instance from JSON string.
- metadata(with_doc_link: bool = False) CheckMetadata [source]#
Return check metadata.
- Parameters
- with_doc_linkbool, default False
whethere to include doc link in summary or not
- Returns
- Dict[str, Any]
- params(show_defaults: bool = False) Dict [source]#
Return parameters to show when printing the check.
- remove_condition(index: int)[source]#
Remove given condition by index.
- Parameters
- indexint
index of condtion to remove
- run(dataset: Union[Dataset, DataFrame], model: Optional[BasicModel] = None, feature_importance: Optional[Series] = None, feature_importance_force_permutation: bool = False, feature_importance_timeout: int = 120, with_display: bool = True, y_pred: Optional[ndarray] = None, y_proba: Optional[ndarray] = None, y_pred_train: Optional[ndarray] = None, y_pred_test: Optional[ndarray] = None, y_proba_train: Optional[ndarray] = None, y_proba_test: Optional[ndarray] = None, model_classes: Optional[List] = None) CheckResult [source]#
Run check.
- Parameters
- dataset: Union[Dataset, pd.DataFrame]
Dataset or DataFrame object, representing data an estimator was fitted on
- model: Optional[BasicModel], default: None
A scikit-learn-compatible fitted estimator instance
- feature_importance: pd.Series , default: None
pass manual features importance
- feature_importance_force_permutationbool , default: False
force calculation of permutation features importance
- feature_importance_timeoutint , default: 120
timeout in second for the permutation features importance calculation
- y_pred_train: Optional[np.ndarray] , default: None
Array of the model prediction over the train dataset.
- y_pred_test: Optional[np.ndarray] , default: None
Array of the model prediction over the test dataset.
- y_proba_train: Optional[np.ndarray] , default: None
Array of the model prediction probabilities over the train dataset.
- y_proba_test: Optional[np.ndarray] , default: None
Array of the model prediction probabilities over the test dataset.
- model_classes: Optional[List] , default: None
For classification: list of classes known to the model
- abstract run_logic(context, dataset_kind) CheckResult [source]#
Run check.
- class TrainTestCheck[source]#
Parent class for checks that compare two datasets.
The class checks train dataset and test dataset for model training and test.
Methods
add_condition
(name, condition_func, **params)Add new condition function to the check.
Remove all conditions from this check instance.
conditions_decision
(result)Run conditions on given result.
config
([include_version, include_defaults])Return check configuration (conditions' configuration not yet supported).
alias of
Context
from_config
(conf[, version_unmatch])Return check object from a CheckConfig object.
from_json
(conf[, version_unmatch])Deserialize check instance from JSON string.
metadata
([with_doc_link])Return check metadata.
name
()Name of class in split camel case.
params
([show_defaults])Return parameters to show when printing the check.
remove_condition
(index)Remove given condition by index.
run
(train_dataset, test_dataset[, model, ...])Run check.
run_logic
(context)Run check.
to_json
([indent, include_version, ...])Serialize check instance to JSON string.
- add_condition(name: str, condition_func: Callable[[Any], Union[ConditionResult, bool]], **params)[source]#
Add new condition function to the check.
- Parameters
- namestr
Name of the condition. should explain the condition action and parameters
- condition_funcCallable[[Any], Union[List[ConditionResult], bool]]
Function which gets the value of the check and returns object of List[ConditionResult] or boolean.
- paramsdict
Additional parameters to pass when calling the condition function.
- conditions_decision(result: CheckResult) List[ConditionResult] [source]#
Run conditions on given result.
- config(include_version: bool = True, include_defaults: bool = True) CheckConfig [source]#
Return check configuration (conditions’ configuration not yet supported).
- Returns
- CheckConfig
includes the checks class name, params, and module name.
- classmethod from_config(conf: CheckConfig, version_unmatch: Optional[Union[Literal['raise'], Literal['warn']]] = 'warn') Self [source]#
Return check object from a CheckConfig object.
- Parameters
- confDict[Any, Any]
- Returns
- BaseCheck
the check class object from given config
- classmethod from_json(conf: str, version_unmatch: Optional[Union[Literal['raise'], Literal['warn']]] = 'warn') Self [source]#
Deserialize check instance from JSON string.
- metadata(with_doc_link: bool = False) CheckMetadata [source]#
Return check metadata.
- Parameters
- with_doc_linkbool, default False
whethere to include doc link in summary or not
- Returns
- Dict[str, Any]
- params(show_defaults: bool = False) Dict [source]#
Return parameters to show when printing the check.
- remove_condition(index: int)[source]#
Remove given condition by index.
- Parameters
- indexint
index of condtion to remove
- run(train_dataset: Union[Dataset, DataFrame], test_dataset: Union[Dataset, DataFrame], model: Optional[BasicModel] = None, feature_importance: Optional[Series] = None, feature_importance_force_permutation: bool = False, feature_importance_timeout: int = 120, with_display: bool = True, y_pred_train: Optional[ndarray] = None, y_pred_test: Optional[ndarray] = None, y_proba_train: Optional[ndarray] = None, y_proba_test: Optional[ndarray] = None, model_classes: Optional[List] = None) CheckResult [source]#
Run check.
- Parameters
- train_dataset: Union[Dataset, pd.DataFrame]
Dataset or DataFrame object, representing data an estimator was fitted on
- test_dataset: Union[Dataset, pd.DataFrame]
Dataset or DataFrame object, representing data an estimator predicts on
- model: Optional[BasicModel], default: None
A scikit-learn-compatible fitted estimator instance
- feature_importance: pd.Series , default: None
pass manual features importance
- feature_importance_force_permutationbool , default: False
force calculation of permutation features importance
- feature_importance_timeoutint , default: 120
timeout in second for the permutation features importance calculation
- y_pred_train: Optional[np.ndarray] , default: None
Array of the model prediction over the train dataset.
- y_pred_test: Optional[np.ndarray] , default: None
Array of the model prediction over the test dataset.
- y_proba_train: Optional[np.ndarray] , default: None
Array of the model prediction probabilities over the train dataset.
- y_proba_test: Optional[np.ndarray] , default: None
Array of the model prediction probabilities over the test dataset.
- model_classes: Optional[List] , default: None
For classification: list of classes known to the model
- abstract run_logic(context) CheckResult [source]#
Run check.
- class ModelOnlyCheck[source]#
Parent class for checks that only use a model and no datasets.
Methods
add_condition
(name, condition_func, **params)Add new condition function to the check.
Remove all conditions from this check instance.
conditions_decision
(result)Run conditions on given result.
config
([include_version, include_defaults])Return check configuration (conditions' configuration not yet supported).
alias of
Context
from_config
(conf[, version_unmatch])Return check object from a CheckConfig object.
from_json
(conf[, version_unmatch])Deserialize check instance from JSON string.
metadata
([with_doc_link])Return check metadata.
name
()Name of class in split camel case.
params
([show_defaults])Return parameters to show when printing the check.
remove_condition
(index)Remove given condition by index.
run
(model[, feature_importance, ...])Run check.
run_logic
(context)Run check.
to_json
([indent, include_version, ...])Serialize check instance to JSON string.
- add_condition(name: str, condition_func: Callable[[Any], Union[ConditionResult, bool]], **params)[source]#
Add new condition function to the check.
- Parameters
- namestr
Name of the condition. should explain the condition action and parameters
- condition_funcCallable[[Any], Union[List[ConditionResult], bool]]
Function which gets the value of the check and returns object of List[ConditionResult] or boolean.
- paramsdict
Additional parameters to pass when calling the condition function.
- conditions_decision(result: CheckResult) List[ConditionResult] [source]#
Run conditions on given result.
- config(include_version: bool = True, include_defaults: bool = True) CheckConfig [source]#
Return check configuration (conditions’ configuration not yet supported).
- Returns
- CheckConfig
includes the checks class name, params, and module name.
- classmethod from_config(conf: CheckConfig, version_unmatch: Optional[Union[Literal['raise'], Literal['warn']]] = 'warn') Self [source]#
Return check object from a CheckConfig object.
- Parameters
- confDict[Any, Any]
- Returns
- BaseCheck
the check class object from given config
- classmethod from_json(conf: str, version_unmatch: Optional[Union[Literal['raise'], Literal['warn']]] = 'warn') Self [source]#
Deserialize check instance from JSON string.
- metadata(with_doc_link: bool = False) CheckMetadata [source]#
Return check metadata.
- Parameters
- with_doc_linkbool, default False
whethere to include doc link in summary or not
- Returns
- Dict[str, Any]
- params(show_defaults: bool = False) Dict [source]#
Return parameters to show when printing the check.
- remove_condition(index: int)[source]#
Remove given condition by index.
- Parameters
- indexint
index of condtion to remove
- run(model: BasicModel, feature_importance: Optional[Series] = None, feature_importance_force_permutation: bool = False, feature_importance_timeout: int = 120, with_display: bool = True, y_pred_train: Optional[ndarray] = None, y_pred_test: Optional[ndarray] = None, y_proba_train: Optional[ndarray] = None, y_proba_test: Optional[ndarray] = None) CheckResult [source]#
Run check.
- Parameters
- model: BasicModel
A scikit-learn-compatible fitted estimator instance
- feature_importance: pd.Series , default: None
pass manual features importance
- feature_importance_force_permutationbool , default: False
force calculation of permutation features importance
- feature_importance_timeoutint , default: 120
timeout in second for the permutation features importance calculation
- y_pred_train: Optional[np.ndarray] , default: None
Array of the model prediction over the train dataset.
- y_pred_test: Optional[np.ndarray] , default: None
Array of the model prediction over the test dataset.
- y_proba_train: Optional[np.ndarray] , default: None
Array of the model prediction probabilities over the train dataset.
- y_proba_test: Optional[np.ndarray] , default: None
Array of the model prediction probabilities over the test dataset.
- model_classes: Optional[List] , default: None
For classification: list of classes known to the model
- abstract run_logic(context) CheckResult [source]#
Run check.
- class ModelComparisonContext[source]#
Contain processed input for model comparison checks.
- Attributes
models
Return the models’ dict.
Methods
finalize_check_result
(check_result, check)Run final processing on a check result which includes validation and conditions processing.
- __init__(train_datasets: Union[Dataset, List[Dataset]], test_datasets: Union[Dataset, List[Dataset]], models: Union[List[Any], Mapping[str, Any]])[source]#
Preprocess the parameters.
- finalize_check_result(check_result, check)[source]#
Run final processing on a check result which includes validation and conditions processing.
- property models: Dict#
Return the models’ dict.
- class ModelComparisonCheck[source]#
Parent class for check that compares between two or more models.
Methods
add_condition
(name, condition_func, **params)Add new condition function to the check.
Remove all conditions from this check instance.
conditions_decision
(result)Run conditions on given result.
config
([include_version, include_defaults])Return check configuration (conditions' configuration not yet supported).
from_config
(conf[, version_unmatch])Return check object from a CheckConfig object.
from_json
(conf[, version_unmatch])Deserialize check instance from JSON string.
metadata
([with_doc_link])Return check metadata.
name
()Name of class in split camel case.
params
([show_defaults])Return parameters to show when printing the check.
remove_condition
(index)Remove given condition by index.
run
(train_datasets, test_datasets, models)Initialize context and pass to check logic.
run_logic
(multi_context)Implement here logic of check.
to_json
([indent, include_version, ...])Serialize check instance to JSON string.
- add_condition(name: str, condition_func: Callable[[Any], Union[ConditionResult, bool]], **params)[source]#
Add new condition function to the check.
- Parameters
- namestr
Name of the condition. should explain the condition action and parameters
- condition_funcCallable[[Any], Union[List[ConditionResult], bool]]
Function which gets the value of the check and returns object of List[ConditionResult] or boolean.
- paramsdict
Additional parameters to pass when calling the condition function.
- conditions_decision(result: CheckResult) List[ConditionResult] [source]#
Run conditions on given result.
- config(include_version: bool = True, include_defaults: bool = True) CheckConfig [source]#
Return check configuration (conditions’ configuration not yet supported).
- Returns
- CheckConfig
includes the checks class name, params, and module name.
- classmethod from_config(conf: CheckConfig, version_unmatch: Optional[Union[Literal['raise'], Literal['warn']]] = 'warn') Self [source]#
Return check object from a CheckConfig object.
- Parameters
- confDict[Any, Any]
- Returns
- BaseCheck
the check class object from given config
- classmethod from_json(conf: str, version_unmatch: Optional[Union[Literal['raise'], Literal['warn']]] = 'warn') Self [source]#
Deserialize check instance from JSON string.
- metadata(with_doc_link: bool = False) CheckMetadata [source]#
Return check metadata.
- Parameters
- with_doc_linkbool, default False
whethere to include doc link in summary or not
- Returns
- Dict[str, Any]
- params(show_defaults: bool = False) Dict [source]#
Return parameters to show when printing the check.
- remove_condition(index: int)[source]#
Remove given condition by index.
- Parameters
- indexint
index of condtion to remove
- run(train_datasets: Union[Dataset, List[Dataset]], test_datasets: Union[Dataset, List[Dataset]], models: Union[List[BasicModel], Mapping[str, BasicModel]]) CheckResult [source]#
Initialize context and pass to check logic.
- Parameters
- train_datasets: Union[Dataset, List[Dataset]]
train datasets
- test_datasets: Union[Dataset, List[Dataset]]
test datasets
- models: Union[List[BasicModel], Mapping[str, BasicModel]]
list or map of models
- abstract run_logic(multi_context: ModelComparisonContext) CheckResult [source]#
Implement here logic of check.
- class ModelComparisonSuite[source]#
Suite to run checks of types: CompareModelsBaseCheck.
Methods
add
(check)Add a check or a suite to current suite.
config
()Return suite configuration (checks' conditions' configuration not yet supported).
from_config
(conf[, version_unmatch])Return suite object from a CheckConfig object.
from_json
(conf[, version_unmatch])Deserialize suite instance from JSON string.
remove
(index)Remove a check by given index.
run
(train_datasets, test_datasets, models)Run all checks.
Return tuple of supported check types of this suite.
to_json
([indent])Serialize suite instance to JSON string.
- add(check: Union[BaseCheck, BaseSuite])[source]#
Add a check or a suite to current suite.
- Parameters
- checkBaseCheck
A check or suite to add.
- config() SuiteConfig [source]#
Return suite configuration (checks’ conditions’ configuration not yet supported).
- Returns
- SuiteConfig
includes the suite name, and list of check configs.
- classmethod from_config(conf: SuiteConfig, version_unmatch: Optional[Union[Literal['raise'], Literal['warn']]] = 'warn') Self [source]#
Return suite object from a CheckConfig object.
- Parameters
- confSuiteConfig
the SuiteConfig object
- Returns
- BaseSuite
the suite class object from given config
- from_json(conf: str, version_unmatch: Optional[Union[Literal['raise'], Literal['warn']]] = 'warn') Self [source]#
Deserialize suite instance from JSON string.
- remove(index: int)[source]#
Remove a check by given index.
- Parameters
- indexint
Index of check to remove.
- run(train_datasets: Union[Dataset, List[Dataset]], test_datasets: Union[Dataset, List[Dataset]], models: Union[List[Any], Mapping[str, Any]]) SuiteResult [source]#
Run all checks.
- Parameters
- train_datasetsUnion[Dataset, Container[Dataset]]
representing data an estimator was fitted on
- test_datasets: Union[Dataset, Container[Dataset]]
representing data an estimator was fitted on
- modelsUnion[Container[Any], Mapping[str, Any]]
2 or more scikit-learn-compatible fitted estimator instance
- Returns
- ——-
- SuiteResult
All results by all initialized checks
- Raises
- ——
- ValueError
if check_datasets_policy is not of allowed types