data_integrity#
Module contains all data integrity checks.
Classes
Return the role and logical type of each column.  | 
|
Search for various types of null values, including string representations of null.  | 
|
Detect different variants of string categories (e.g.  | 
|
Detect columns which contain a mix of numerical and string values.  | 
|
Check if there are columns which have only a single unique value in all rows.  | 
|
Search in column[s] for values that contains only special characters.  | 
|
Detect strings with length that is much longer/shorter than the identified "normal" string lengths.  | 
|
Checks for duplicate samples in the dataset.  | 
|
Find samples which have the exact same features' values but different labels.  | 
|
Check if a dataset is imbalanced by looking at the target variable distribution.  | 
|
Detects outliers in a dataset using the LoOP algorithm.  | 
|
Return the PPS (Predictive Power Score) of all features in relation to the label.  | 
|
Checks for pairwise correlation between the features.  | 
|
Check if identifiers (Index/Date) can be used to predict the label.  | 
|
Percent of 'Null' values in each column.  |