JUnit#

This tutorial demonstrates how deepchecks can be used to output junit from tests performed on data or model. This can then be read in by junit parsers within CI/CD pipelines. This allows a more generic, but less adaptable, integration compared to the pytest integration where a developer needs to wrap every test manually. We will use the iris dataset from scikit-learn, and check whether certain columns contain drift between the training and the test sets.

General Structure#

JUnit comprises of 3 sections: 1. The test suites sections. This is an optional section, but used with the Deepchecks Junit Serializer since multiple suites are used. 2. The test suite section. This is used to group tests by their domain such as model, data, or evaluation test. A catch all ‘checks’ section is also added for custom checks added. 3. The test case section. This is the atomic unit of the payload and contains the details about a deepchecks.core.CheckResult. This can either be a pass, failure, or skip.

It will output the following formatted string by default, but a XML can also be extracted:

<testsuites errors=”0” failures=”9” name=”Full Suite” tests=”54” time=”45”>

<testsuite errors=”0” failures=”3” name=”train_test_validation” tests=”12” time=”1” timestamp=”2022-11-22T05:49:01”>

<testcase classname=”deepchecks.tabular.checks.train_test_validation.feature_label_correlation_change.FeatureLabelCorrelationChange” name=”Feature Label Correlation Change” time=”0”>: <system-out>Passed for 4 relevant columns, Found 2 out of 4 features in train dataset with PPS above threshold: {‘petal width (cm)’: ‘0.91’, ‘petal length (cm)’: ‘0.86’}</system-out>

</testcase> …

</testsuite>

Failed test cases will return in the following manner:

<testcase classname=”deepchecks.tabular.checks.train_test_validation.date_train_test_leakage_duplicates.DateTrainTestLeakageDuplicates” name=”Date Train Test Leakage Duplicates” time=”0”>: <failure message=”Dataset does not contain a datetime” type=”failure”>Check if test dates are present in train data.</failure>

</testcase>

Failed test cases can also be coerced to skipped to allow the recording of the failed test, but to not put a break into a CI/CD system. This is especially helpful when first starting out with deepchecks in CI/CD pipelines.

from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split

from deepchecks import Dataset
from deepchecks.tabular.suites import full_suite
from deepchecks.core.serialization.suite_result.junit import SuiteResultSerializer as JunitSerializer

Executing a Test Suite#

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from deepchecks.tabular import Dataset
from sklearn.datasets import load_iris

label_col = 'target'

X, y = load_iris(return_X_y=True, as_frame=True)
X[label_col] = y
df_train, df_test = train_test_split(X, stratify=X[label_col], random_state=0)


# Train Model
rf_clf = RandomForestClassifier(random_state=0, max_depth=2)
rf_clf.fit(df_train.drop(label_col, axis=1), df_train[label_col])
ds_train = Dataset(df_train, label=label_col, cat_features=[])
ds_test = Dataset(df_test, label=label_col, cat_features=[])

suite = full_suite()

results = suite.run(train_dataset=ds_train, test_dataset=ds_test, model=rf_clf)

output = JunitSerializer(results).serialize()

CML

Weights & Biases (wandb)