deepchecks.tabular#
Package for tabular functionality.
Modules
Module importing all tabular checks. |
|
Module contains all prebuilt suites. |
|
Module for working with pre-built datasets. |
Classes
- class Dataset[source]#
Dataset wraps pandas DataFrame together with ML related metadata.
The Dataset class is containing additional data and methods intended for easily accessing metadata relevant for the training or validating of an ML models.
- Parameters
- dfAny
- An object that can be casted to a pandas DataFrame
containing data relevant for the training or validating of a ML models.
- labelt.Union[Hashable, pd.Series, pd.DataFrame, np.ndarray] , default: None
label column provided either as a string with the name of an existing column in the DataFrame or a label object including the label data (pandas Series/DataFrame or a numpy array) that will be concatenated to the data in the DataFrame. in case of label data the following logic is applied to set the label name: - Series: takes the series name or ‘target’ if name is empty - DataFrame: expect single column in the dataframe and use its name - numpy: use ‘target’
- featurest.Optional[t.Sequence[Hashable]] , default: None
List of names for the feature columns in the DataFrame.
- cat_featurest.Optional[t.Sequence[Hashable]] , default: None
List of names for the categorical features in the DataFrame. In order to disable categorical. features inference, pass cat_features=[]
- index_namet.Optional[Hashable] , default: None
Name of the index column in the dataframe. If set_index_from_dataframe_index is True and index_name is not None, index will be created from the dataframe index level with the given name. If index levels have no names, an int must be used to select the appropriate level by order.
- set_index_from_dataframe_indexbool , default: False
If set to true, index will be created from the dataframe index instead of dataframe columns (default). If index_name is None, first level of the index will be used in case of a multilevel index.
- datetime_namet.Optional[Hashable] , default: None
Name of the datetime column in the dataframe. If set_datetime_from_dataframe_index is True and datetime_name is not None, date will be created from the dataframe index level with the given name. If index levels have no names, an int must be used to select the appropriate level by order.
- set_datetime_from_dataframe_indexbool , default: False
If set to true, date will be created from the dataframe index instead of dataframe columns (default). If datetime_name is None, first level of the index will be used in case of a multilevel index.
- convert_datetimebool , default: True
If set to true, date will be converted to datetime using pandas.to_datetime.
- datetime_argst.Optional[t.Dict] , default: None
pandas.to_datetime args used for conversion of the datetime column. (look at https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html for more documentation)
- max_categorical_ratiofloat , default: 0.01
The max ratio of unique values in a column in order for it to be inferred as a categorical feature.
- max_categoriesint , default: None
The maximum number of categories in a column in order for it to be inferred as a categorical feature. if None, uses is_categorical default inference mechanism.
- label_typestr , default: None
Used to assume target model type if not found on model. Values (‘classification_label’, ‘regression_label’) If None then label type is inferred from label using is_categorical logic.
- __init__(df: Any, label: Optional[Union[Hashable, Series, DataFrame, ndarray]] = None, features: Optional[Sequence[Hashable]] = None, cat_features: Optional[Sequence[Hashable]] = None, index_name: Optional[Hashable] = None, set_index_from_dataframe_index: bool = False, datetime_name: Optional[Hashable] = None, set_datetime_from_dataframe_index: bool = False, convert_datetime: bool = True, datetime_args: Optional[Dict] = None, max_categorical_ratio: float = 0.01, max_categories: Optional[int] = None, label_type: Optional[str] = None)[source]#
- __new__(*args, **kwargs)#
- class Context[source]#
Contains all the data + properties the user has passed to a check/suite, and validates it seamlessly.
- Parameters
- train: Union[Dataset, pd.DataFrame] , default: None
Dataset or DataFrame object, representing data an estimator was fitted on
- test: Union[Dataset, pd.DataFrame] , default: None
Dataset or DataFrame object, representing data an estimator predicts on
- model: BasicModel , default: None
A scikit-learn-compatible fitted estimator instance
- model_name: str , default: ‘’
The name of the model
- features_importance: pd.Series , default: None
pass manual features importance
- feature_importance_force_permutationbool , default: False
force calculation of permutation features importance
- feature_importance_timeoutint , default: 120
timeout in second for the permutation features importance calculation
- scorersMapping[str, Union[str, Callable]] , default: None
dict of scorers names to scorer sklearn_name/function
- scorers_per_classMapping[str, Union[str, Callable]] , default: None
dict of scorers for classification without averaging of the classes. See <a href= “https://scikit-learn.org/stable/modules/model_evaluation.html#from-binary-to-multiclass-and-multilabel”> scikit-learn docs</a>
- y_pred_train: np.ndarray , default: None
Array of the model prediction over the train dataset.
- y_pred_test: np.ndarray , default: None
Array of the model prediction over the test dataset.
- y_proba_train: np.ndarray , default: None
Array of the model prediction probabilities over the train dataset.
- y_proba_test: np.ndarray , default: None
Array of the model prediction probabilities over the test dataset.
- __init__(train: Optional[Union[Dataset, DataFrame]] = None, test: Optional[Union[Dataset, DataFrame]] = None, model: Optional[BasicModel] = None, model_name: str = '', features_importance: Optional[Series] = None, feature_importance_force_permutation: bool = False, feature_importance_timeout: int = 120, scorers: Optional[Mapping[str, Union[str, Callable]]] = None, scorers_per_class: Optional[Mapping[str, Union[str, Callable]]] = None, y_pred_train: Optional[ndarray] = None, y_pred_test: Optional[ndarray] = None, y_proba_train: Optional[ndarray] = None, y_proba_test: Optional[ndarray] = None)[source]#
- __new__(*args, **kwargs)#
- class Suite[source]#
Tabular suite to run checks of types: TrainTestCheck, SingleDatasetCheck, ModelOnlyCheck.
- __new__(*args, **kwargs)#
- class SingleDatasetCheck[source]#
Parent class for checks that only use one dataset.
- __new__(*args, **kwargs)#
- class TrainTestCheck[source]#
Parent class for checks that compare two datasets.
The class checks train dataset and test dataset for model training and test.
- __new__(*args, **kwargs)#
- class ModelOnlyCheck[source]#
Parent class for checks that only use a model and no datasets.
- __new__(*args, **kwargs)#
- class ModelComparisonContext[source]#
Contain processed input for model comparison checks.
- __init__(train_datasets: Union[Dataset, List[Dataset]], test_datasets: Union[Dataset, List[Dataset]], models: Union[List[Any], Mapping[str, Any]])[source]#
Preprocess the parameters.
- __new__(*args, **kwargs)#