train_test_validation#

Module contains checks of train test validation checks.

Classes

CategoryMismatchTrainTest

Find new categories in the test set.

NewCategoryTrainTest

Find new categories in the test set.

DatasetsSizeComparison

Verify test dataset size comparing it to the train dataset size.

DateTrainTestLeakageDuplicates

Check if test dates are present in train data.

DateTrainTestLeakageOverlap

Check test data that is dated earlier than the latest date in train.

IdentifierLabelCorrelation

Check if identifiers (Index/Date) can be used to predict the label.

IndexTrainTestLeakage

Check if test indexes are present in train data.

NewLabelTrainTest

Find new labels in test.

FeatureLabelCorrelationChange

Return the Predictive Power Score of all features, in order to estimate each feature's ability to predict the label.

StringMismatchComparison

Detect different variants of string categories between the same categorical column in two datasets.

TrainTestFeatureDrift

The TrainTestFeatureDrift check is deprecated and will be removed in the 0.14 version.

FeatureDrift

Calculate drift between train dataset and test dataset per feature, using statistical measures.

TrainTestLabelDrift

The TrainTestLabelDrift check is deprecated and will be removed in the 0.14 version.

LabelDrift

Calculate label drift between train dataset and test dataset, using statistical measures.

TrainTestSamplesMix

Detect samples in the test data that appear also in training data.

MultivariateDrift

Calculate drift between the entire train and test datasets using a model trained to distinguish between them.

WholeDatasetDrift

Calculate drift between the entire train and test datasets using a model trained to distinguish between them.