methodology#

Module contains checks for methodological flaws in the model building process.

Classes

BoostingOverfit

Check for overfit caused by using too many iterations in a gradient boosted model.

UnusedFeatures

Detect features that are nearly unused by the model.

SingleFeatureContribution

Return the PPS (Predictive Power Score) of all features in relation to the label.

SingleFeatureContributionTrainTest

Return the Predictive Power Score of all features, in order to estimate each feature's ability to predict the label.

IndexTrainTestLeakage

Check if test indexes are present in train data.

TrainTestSamplesMix

Detect samples in the test data that appear also in training data.

DateTrainTestLeakageDuplicates

Check if test dates are present in train data.

DateTrainTestLeakageOverlap

Check test data that is dated earlier than latest date in train.

IdentifierLeakage

Check if identifiers (Index/Date) can be used to predict the label.

ModelInferenceTime

Measure model average inference time (in seconds) per sample.

DatasetsSizeComparison

Verify test dataset size comparing it to the train dataset size.