data_integrity#

Module contains all data integrity checks.

Classes

`ColumnsInfo`	Return the role and logical type of each column.
`MixedNulls`	Search for various types of null values, including string representations of null.
`StringMismatch`	Detect different variants of string categories (e.g.
`MixedDataTypes`	Detect columns which contain a mix of numerical and string values.
`IsSingleValue`	Check if there are columns which have only a single unique value in all rows.
`SpecialCharacters`	Search in column[s] for values that contains only special characters.
`StringLengthOutOfBounds`	Detect strings with length that is much longer/shorter than the identified "normal" string lengths.
`DataDuplicates`	Checks for duplicate samples in the dataset.
`ConflictingLabels`	Find samples which have the exact same features' values but different labels.
`ClassImbalance`	Check if a dataset is imbalanced by looking at the target variable distribution.
`OutlierSampleDetection`	Detects outliers in a dataset using the LoOP algorithm.
`FeatureLabelCorrelation`	Return the PPS (Predictive Power Score) of all features in relation to the label.
`FeatureFeatureCorrelation`	Checks for pairwise correlation between the features.
`IdentifierLabelCorrelation`	Check if identifiers (Index/Date) can be used to predict the label.
`PercentOfNulls`	Percent of 'Null' values in each column.

previous

checks

next

ColumnsInfo