data_integrity#
Module importing all nlp checks.
Classes
Return the PPS (Predictive Power Score) of all properties in relation to the label. |
|
Find outliers with respect to the given properties. |
|
Checks for duplicate samples in the dataset. |
|
Find identical samples which have different labels. |
|
Find samples that contain special characters and also the most common special characters in the dataset. |
|
Find samples that contain tokens unsupported by your tokenizer. |
|
Search for under annotated data segments. |
|
Search for under annotated data segments. |
|
Checks for frequent substrings in the dataset. |