data_integrity#
Module contains all data integrity checks.
Classes
Return the role and logical type of each column. |
|
Search for various types of null values, including string representations of null. |
|
Detect different variants of string categories (e.g. |
|
Detect columns which contain a mix of numerical and string values. |
|
Check if there are columns which have only a single unique value in all rows. |
|
Search in column[s] for values that contains only special characters. |
|
Detect strings with length that is much longer/shorter than the identified "normal" string lengths. |
|
Checks for duplicate samples in the dataset. |
|
Find samples which have the exact same features' values but different labels. |
|
Check if a dataset is imbalanced by looking at the target variable distribution. |
|
Detects outliers in a dataset using the LoOP algorithm. |
|
Return the PPS (Predictive Power Score) of all features in relation to the label. |
|
Checks for pairwise correlation between the features. |
|
Check if identifiers (Index/Date) can be used to predict the label. |
|
Percent of 'Null' values in each column. |