integrity#

Module contains all data integrity checks.

Classes

MixedNulls

Search for various types of null values, including string representations of null.

StringMismatch

Detect different variants of string categories (e.g.

MixedDataTypes

Detect columns which contain a mix of numerical and string values.

IsSingleValue

Check if there are columns which have only a single unique value in all rows.

SpecialCharacters

Search in column[s] for values that contains only special characters.

StringLengthOutOfBounds

Detect strings with length that is much longer/shorter than the identified "normal" string lengths.

StringMismatchComparison

Detect different variants of string categories between the same categorical column in two datasets.

DominantFrequencyChange

Check if dominant values have increased significantly between test and reference data.

DataDuplicates

Checks for duplicate samples in the dataset.

CategoryMismatchTrainTest

Find new categories in the test set.

NewLabelTrainTest

Find new labels in test.

ConflictingLabels

Find samples which have the exact same features' values but different labels.

OutlierSampleDetection

Detects outliers in a dataset using the LoOP algorithm.