New Category#

import pandas as pd

from deepchecks.tabular import Dataset
from deepchecks.tabular.checks.integrity import CategoryMismatchTrainTest
train_data = {"col1": ["somebody", "once", "told", "me"] * 10}
test_data = {"col1": ["the","world","is", "gonna", "role", "me","I", "I"] * 10}
train = Dataset(pd.DataFrame(data=train_data), cat_features=["col1"])
test = Dataset(pd.DataFrame(data=test_data), cat_features=["col1"])
CategoryMismatchTrainTest().run(train, test)

Category Mismatch Train Test

Find new categories in the test set.

Additional Outputs
  Number of new categories Percent of new categories in sample New categories examples
Column      
col1 6 87.5% ['I', 'gonna', 'is', 'role', 'the']


train_data = {"col1": ["a", "b", "a", "c"] * 10, "col2": ['a','b','b','q']*10}
test_data = {"col1": ["a","b","d"] * 10, "col2": ['a', '2', '1']*10}
train = Dataset(pd.DataFrame(data=train_data), cat_features=["col1","col2"])
test = Dataset(pd.DataFrame(data=test_data), cat_features=["col1", "col2"])
CategoryMismatchTrainTest().run(train, test)

Category Mismatch Train Test

Find new categories in the test set.

Additional Outputs
  Number of new categories Percent of new categories in sample New categories examples
Column      
col1 1 33.33% ['d']
col2 2 66.67% ['1', '2']


Total running time of the script: ( 0 minutes 0.020 seconds)

Gallery generated by Sphinx-Gallery