deepchecks.tabular#

Package for tabular functionality.

Modules

checks

Module importing all tabular checks.

suites

Module contains all prebuilt suites.

datasets

Module for working with pre-built datasets.

Classes

class Dataset[source]#

Dataset wraps pandas DataFrame together with ML related metadata.

The Dataset class is containing additional data and methods intended for easily accessing metadata relevant for the training or validating of an ML models.

Parameters
dfAny
An object that can be casted to a pandas DataFrame
  • containing data relevant for the training or validating of a ML models.

labelt.Union[Hashable, pd.Series, pd.DataFrame, np.ndarray] , default: None

label column provided either as a string with the name of an existing column in the DataFrame or a label object including the label data (pandas Series/DataFrame or a numpy array) that will be concatenated to the data in the DataFrame. in case of label data the following logic is applied to set the label name: - Series: takes the series name or ‘target’ if name is empty - DataFrame: expect single column in the dataframe and use its name - numpy: use ‘target’

featurest.Optional[t.Sequence[Hashable]] , default: None

List of names for the feature columns in the DataFrame.

cat_featurest.Optional[t.Sequence[Hashable]] , default: None

List of names for the categorical features in the DataFrame. In order to disable categorical. features inference, pass cat_features=[]

index_namet.Optional[Hashable] , default: None

Name of the index column in the dataframe. If set_index_from_dataframe_index is True and index_name is not None, index will be created from the dataframe index level with the given name. If index levels have no names, an int must be used to select the appropriate level by order.

set_index_from_dataframe_indexbool , default: False

If set to true, index will be created from the dataframe index instead of dataframe columns (default). If index_name is None, first level of the index will be used in case of a multilevel index.

datetime_namet.Optional[Hashable] , default: None

Name of the datetime column in the dataframe. If set_datetime_from_dataframe_index is True and datetime_name is not None, date will be created from the dataframe index level with the given name. If index levels have no names, an int must be used to select the appropriate level by order.

set_datetime_from_dataframe_indexbool , default: False

If set to true, date will be created from the dataframe index instead of dataframe columns (default). If datetime_name is None, first level of the index will be used in case of a multilevel index.

convert_datetimebool , default: True

If set to true, date will be converted to datetime using pandas.to_datetime.

datetime_argst.Optional[t.Dict] , default: None

pandas.to_datetime args used for conversion of the datetime column. (look at https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html for more documentation)

max_categorical_ratiofloat , default: 0.01

The max ratio of unique values in a column in order for it to be inferred as a categorical feature.

max_categoriesint , default: None

The maximum number of categories in a column in order for it to be inferred as a categorical feature. if None, uses is_categorical default inference mechanism.

label_typestr , default: None

Used to assume target model type if not found on model. Values (‘classification_label’, ‘regression_label’) If None then label type is inferred from label using is_categorical logic.

__init__(df: Any, label: Optional[Union[Hashable, Series, DataFrame, ndarray]] = None, features: Optional[Sequence[Hashable]] = None, cat_features: Optional[Sequence[Hashable]] = None, index_name: Optional[Hashable] = None, set_index_from_dataframe_index: bool = False, datetime_name: Optional[Hashable] = None, set_datetime_from_dataframe_index: bool = False, convert_datetime: bool = True, datetime_args: Optional[Dict] = None, max_categorical_ratio: float = 0.01, max_categories: Optional[int] = None, label_type: Optional[str] = None)[source]#
__new__(*args, **kwargs)#
class Context[source]#

Contains all the data + properties the user has passed to a check/suite, and validates it seamlessly.

Parameters
train: Union[Dataset, pd.DataFrame] , default: None

Dataset or DataFrame object, representing data an estimator was fitted on

test: Union[Dataset, pd.DataFrame] , default: None

Dataset or DataFrame object, representing data an estimator predicts on

model: BasicModel , default: None

A scikit-learn-compatible fitted estimator instance

model_name: str , default: ‘’

The name of the model

features_importance: pd.Series , default: None

pass manual features importance

feature_importance_force_permutationbool , default: False

force calculation of permutation features importance

feature_importance_timeoutint , default: 120

timeout in second for the permutation features importance calculation

scorersMapping[str, Union[str, Callable]] , default: None

dict of scorers names to scorer sklearn_name/function

scorers_per_classMapping[str, Union[str, Callable]] , default: None

dict of scorers for classification without averaging of the classes. See <a href= “https://scikit-learn.org/stable/modules/model_evaluation.html#from-binary-to-multiclass-and-multilabel”> scikit-learn docs</a>

y_pred_train: np.ndarray , default: None

Array of the model prediction over the train dataset.

y_pred_test: np.ndarray , default: None

Array of the model prediction over the test dataset.

y_proba_train: np.ndarray , default: None

Array of the model prediction probabilities over the train dataset.

y_proba_test: np.ndarray , default: None

Array of the model prediction probabilities over the test dataset.

__init__(train: Optional[Union[Dataset, DataFrame]] = None, test: Optional[Union[Dataset, DataFrame]] = None, model: Optional[BasicModel] = None, model_name: str = '', features_importance: Optional[Series] = None, feature_importance_force_permutation: bool = False, feature_importance_timeout: int = 120, scorers: Optional[Mapping[str, Union[str, Callable]]] = None, scorers_per_class: Optional[Mapping[str, Union[str, Callable]]] = None, y_pred_train: Optional[ndarray] = None, y_pred_test: Optional[ndarray] = None, y_proba_train: Optional[ndarray] = None, y_proba_test: Optional[ndarray] = None)[source]#
__new__(*args, **kwargs)#
class Suite[source]#

Tabular suite to run checks of types: TrainTestCheck, SingleDatasetCheck, ModelOnlyCheck.

__init__(name: str, *checks: Union[BaseCheck, BaseSuite])[source]#
__new__(*args, **kwargs)#
class SingleDatasetCheck[source]#

Parent class for checks that only use one dataset.

__init__(**kwargs)[source]#
__new__(*args, **kwargs)#
class TrainTestCheck[source]#

Parent class for checks that compare two datasets.

The class checks train dataset and test dataset for model training and test.

__init__(**kwargs)[source]#
__new__(*args, **kwargs)#
class ModelOnlyCheck[source]#

Parent class for checks that only use a model and no datasets.

__init__(**kwargs)[source]#
__new__(*args, **kwargs)#
class ModelComparisonContext[source]#

Contain processed input for model comparison checks.

__init__(train_datasets: Union[Dataset, List[Dataset]], test_datasets: Union[Dataset, List[Dataset]], models: Union[List[Any], Mapping[str, Any]])[source]#

Preprocess the parameters.

__new__(*args, **kwargs)#
class ModelComparisonCheck[source]#

Parent class for check that compares between two or more models.

__init__(**kwargs)[source]#
__new__(*args, **kwargs)#
class ModelComparisonSuite[source]#

Suite to run checks of types: CompareModelsBaseCheck.

__init__(name: str, *checks: Union[BaseCheck, BaseSuite])[source]#
__new__(*args, **kwargs)#