data_integrity#

Module importing all nlp checks.

Classes

PropertyLabelCorrelation

Return the PPS (Predictive Power Score) of all properties in relation to the label.

TextPropertyOutliers

Find outliers with respect to the given properties.

TextDuplicates

Checks for duplicate samples in the dataset.

ConflictingLabels

Find identical samples which have different labels.

SpecialCharacters

Find samples that contain special characters and also the most common special characters in the dataset.

UnknownTokens

Find samples that contain tokens unsupported by your tokenizer.

UnderAnnotatedMetaDataSegments

Search for under annotated data segments.

UnderAnnotatedPropertySegments

Search for under annotated data segments.

FrequentSubstrings

Checks for frequent substrings in the dataset.