Pytest#
This tutorial demonstrates how deepchecks can be used inside unit tests performed on data or model, with the pytest
framework.
We will use the diabetes dataset from scikit-learn, and check whether certain columns contain drift
between the training and the test sets.
import pytest
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from deepchecks import Dataset
from deepchecks.tabular.checks import FeatureDrift
from deepchecks.tabular.suites import data_integrity
Defining Pytest Fixtures#
pytest fixtures provide a defined, reliable and consistent context for the tests. This could include environment (for
example a database configured with known parameters) or content (such as a dataset).
In this tutorial we will define a fixture that load the diabetes dataset from scikit-learn.
@pytest.fixture(scope='session')
def diabetes_df():
    diabetes = load_diabetes(return_X_y=False, as_frame=True).frame
    return diabetes
Implementing the Test#
Now, we will implement a test that will check if some columns in the dataset have drifted between the train and test datasets. the test sets.
def test_diabetes_drift(diabetes_df):
    train_df, test_df = train_test_split(diabetes_df, test_size=0.33, random_state=42)
    train = Dataset(train_df, label='target', cat_features=['sex'])
    test = Dataset(test_df, label='target', cat_features=['sex'])
    check = FeatureDrift(columns=['age', 'sex', 'bmi'])
    check.add_condition_drift_score_not_greater_than(max_allowed_psi_score=0.2,
                                                     max_allowed_earth_movers_score=0.1)
    result = check.run(train, test)
    assert result.passed_conditions()
Please note the passed_conditions() method of the deepchecks.core.CheckResult object. This method will return True if all the
conditions are met, and False otherwise.
It’s possible to evaluate the result of a suite of checks, and to get the overall result of the test, by using the
deepchecks.core.SuiteResult.passed() method.
def test_diabetes_integrity(diabetes_df):
    ds = Dataset(diabetes_df, label='target', cat_features=['sex'])
    suite = data_integrity()
    result = suite.run(ds)
    assert result.passed(fail_if_warning=True, fail_if_check_not_run=False)