This tutorial demonstrates how deepchecks can be used to output junit from tests performed on data or model. This can then be read in by junit parsers within CI/CD pipelines. This allows a more generic, but less adaptable, integration compared to the pytest integration where a developer needs to wrap every test manually. We will use the iris dataset from scikit-learn, and check whether certain columns contain drift between the training and the test sets.

General Structure#

JUnit comprises of 3 sections:

  1. The test suites sections: This is an optional section, but used with the Deepchecks Junit Serializer since multiple suites are used.

  2. The test suite section: This is used to group tests by their domain such as model, data, or evaluation test. A catch all ‘checks’ section is also added for custom checks added.

  3. The test case section: This is the atomic unit of the payload and contains the details about a deepchecks.core.CheckResult. This can either be a pass, failure, or skip.

It will output the following formatted string by default, but an XML can also be extracted:

<testsuites errors="0" failures="9" name="Full Suite" tests="54" time="45">
    <testsuite errors="0" failures="3" name="train_test_validation" tests="12" time="1" timestamp="2022-11-22T05:49:01">
        <testcase classname="deepchecks.tabular.checks.train_test_validation.feature_label_correlation_change.FeatureLabelCorrelationChange" name="Feature Label Correlation Change" time="0">
            <system-out>Passed for 4 relevant columns, Found 2 out of 4 features in train dataset with PPS above threshold: {'petal width (cm)': '0.91', 'petal length (cm)': '0.86'}</system-out>

Failed test cases will return in the following manner:

<testcase classname="deepchecks.tabular.checks.train_test_validation.date_train_test_leakage_duplicates.DateTrainTestLeakageDuplicates" name="Date Train Test Leakage Duplicates" time="0">
    <failure message="Dataset does not contain a datetime" type="failure">Check if test dates are present in train data.</failure>

Failed test cases can also be coerced to skipped to allow the recording of the failed test, but to not put a break into a CI/CD system. This is especially helpful when first starting out with deepchecks in CI/CD pipelines.

from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split

from deepchecks import Dataset
from deepchecks.tabular.suites import full_suite
from deepchecks.core.serialization.suite_result.junit import SuiteResultSerializer as JunitSerializer

Executing a Test Suite#

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from deepchecks.tabular import Dataset
from sklearn.datasets import load_iris

label_col = 'target'

X, y = load_iris(return_X_y=True, as_frame=True)
X[label_col] = y
df_train, df_test = train_test_split(X, stratify=X[label_col], random_state=0)

# Train Model
rf_clf = RandomForestClassifier(random_state=0, max_depth=2)
rf_clf.fit(df_train.drop(label_col, axis=1), df_train[label_col])
ds_train = Dataset(df_train, label=label_col, cat_features=[])
ds_test = Dataset(df_test, label=label_col, cat_features=[])

suite = full_suite()

results = suite.run(train_dataset=ds_train, test_dataset=ds_test, model=rf_clf)

output = JunitSerializer(results).serialize()