data_integrity#
- data_integrity(columns: Optional[Union[Hashable, List[Hashable]]] = None, ignore_columns: Optional[Union[Hashable, List[Hashable]]] = None, n_top_columns: Optional[int] = None, n_samples: Optional[int] = None, random_state: int = 42, n_to_show: int = 5, **kwargs) Suite[source]#
- Suite for detecting integrity issues within a single dataset. - List of Checks:
- List of Checks# - Check Example - API Reference 
 - Parameters
- columnsUnion[Hashable, List[Hashable]] , default: None
- The columns to be checked. If None, all columns will be checked except the ones in ignore_columns. 
- ignore_columnsUnion[Hashable, List[Hashable]] , default: None
- The columns to be ignored. If None, no columns will be ignored. 
- n_top_columnsint , optional
- number of columns to show ordered by feature importance (date, index, label are first) (check dependent) 
- n_samplesint , default: 1_000_000
- number of samples to use for checks that sample data. If none, using the default n_samples per check. 
- random_stateint, default: 42
- random seed for all checks. 
- n_to_showint , default: 5
- number of top results to show (check dependent) 
- **kwargsdict
- additional arguments to pass to the checks. 
 
- Returns
- Suite
- A suite for detecting integrity issues within a single dataset. 
 
 - See also - Examples - >>> from deepchecks.tabular.suites import data_integrity >>> suite = data_integrity(columns=['a', 'b', 'c'], n_samples=1_000_000) >>> result = suite.run() >>> result.show() 
- run(self, train_dataset: Optional[Union[Dataset, DataFrame]] = None, test_dataset: Optional[Union[Dataset, DataFrame]] = None, model: Optional[BasicModel] = None, feature_importance: Optional[Series] = None, feature_importance_force_permutation: bool = False, feature_importance_timeout: int = 120, with_display: bool = True, y_pred_train: Optional[ndarray] = None, y_pred_test: Optional[ndarray] = None, y_proba_train: Optional[ndarray] = None, y_proba_test: Optional[ndarray] = None, run_single_dataset: Optional[str] = None, model_classes: Optional[List] = None) SuiteResult#
- Run all checks. - Parameters
- train_dataset: Optional[Union[Dataset, pd.DataFrame]] , default None
- object, representing data an estimator was fitted on 
- test_datasetOptional[Union[Dataset, pd.DataFrame]] , default None
- object, representing data an estimator predicts on 
- modelOptional[BasicModel] , default None
- A scikit-learn-compatible fitted estimator instance 
- run_single_dataset: Optional[str], default None
- ‘Train’, ‘Test’ , or None to run on both train and test. 
- feature_importance: pd.Series , default: None
- pass manual features importance 
- feature_importance_force_permutationbool , default: False
- force calculation of permutation features importance 
- feature_importance_timeoutint , default: 120
- timeout in second for the permutation features importance calculation 
- y_pred_train: Optional[np.ndarray] , default: None
- Array of the model prediction over the train dataset. 
- y_pred_test: Optional[np.ndarray] , default: None
- Array of the model prediction over the test dataset. 
- y_proba_train: Optional[np.ndarray] , default: None
- Array of the model prediction probabilities over the train dataset. 
- y_proba_test: Optional[np.ndarray] , default: None
- Array of the model prediction probabilities over the test dataset. 
- model_classes: Optional[List] , default: None
- For classification: list of classes known to the model 
 
- Returns
- SuiteResult
- All results by all initialized checks