data_integrity#
Module contains all data integrity checks.
Classes
| Return the role and logical type of each column. | |
| Search for various types of null values, including string representations of null. | |
| Detect different variants of string categories (e.g. | |
| Detect columns which contain a mix of numerical and string values. | |
| Check if there are columns which have only a single unique value in all rows. | |
| Search in column[s] for values that contains only special characters. | |
| Detect strings with length that is much longer/shorter than the identified "normal" string lengths. | |
| Checks for duplicate samples in the dataset. | |
| Find samples which have the exact same features' values but different labels. | |
| Check if a dataset is imbalanced by looking at the target variable distribution. | |
| Detects outliers in a dataset using the LoOP algorithm. | |
| Return the PPS (Predictive Power Score) of all features in relation to the label. | |
| Checks for pairwise correlation between the features. | |
| Check if identifiers (Index/Date) can be used to predict the label. | |
| Percent of 'Null' values in each column. |