Note

Click here to download the full example code

Configure Check Conditions#

The following guide includes different options for configuring a check’s condition(s):

Add Condition
Remove / Edit a Condition
Add a Custom Condition
Set Custom Condition Category

Add Condition#

In order to add a condition to an existing check, we can use any of the pre-defined conditions for that check. The naming convention for the methods that add the condition is add_condition_....

If you want to create and add your custom condition logic for parsing the check’s result value, see Add a Custom Condition.

Add a condition to a new check#

from deepchecks.tabular.checks import DatasetsSizeComparison

check = DatasetsSizeComparison().add_condition_test_size_not_smaller_than(1000)
check

Out:

DatasetsSizeComparison
    Conditions:
            0: Test dataset size is not smaller than 1000

Conditions are used mainly in the context of a Suite, and displayed in the Conditions Summary table. For example how to run in a suite you can look at Add a Custom Condition or if you would like to run the conditions outside of suite you can execute:

import pandas as pd

from deepchecks.tabular import Dataset

# Dummy data
train_dataset = Dataset(pd.DataFrame(data={'x': [1,2,3,4,5,6,7,8,9]}))
test_dataset = Dataset(pd.DataFrame(data={'x': [1,2,3]}))

condition_results = check.conditions_decision(check.run(train_dataset, test_dataset))
condition_results

Out:

[{'details': 'Test dataset size is 3', 'category': <ConditionCategory.FAIL: 'FAIL'>, 'name': 'Test dataset size is not smaller than 1000'}]

Add a condition to a check in a suite#

If we want to add a conditon to a check within an existing suite, we should first find the Check’s ID within the suite, and then add the condition to it, by running the relevant add_condition_ method on that check’s instance. See the next section to understand how to do so.

The condition will then be appended to the list of conditions on that check (or be the first one if no conditions are defined), and each condition will be evaluated separately when running the suite.

Remove / Edit a Condition#

Deepchecks provides different kinds of default suites, which come with pre-defined conditions. You may want to remove a condition in case it isn’t needed for you, or you may want to change the condition’s parameters (since conditions functions are immutable).

To remove a condition, start by printing the Suite and identifing the Check’s ID, and the Condition’s ID:

from deepchecks.tabular.suites import train_test_leakage

suite = train_test_leakage()
suite

Out:

Train Test Leakage Suite: [
    0: DateTrainTestLeakageDuplicates
            Conditions:
                    0: Date leakage ratio is not greater than 0%
    1: DateTrainTestLeakageOverlap
            Conditions:
                    0: Date leakage ratio is not greater than 0%
    2: SingleFeatureContributionTrainTest(ppscore_params={})
            Conditions:
                    0: Train-Test features' Predictive Power Score difference is not greater than 0.2
                    1: Train features' Predictive Power Score is not greater than 0.7
    3: TrainTestSamplesMix
            Conditions:
                    0: Percentage of test data samples that appear in train data not greater than 10%
    4: IdentifierLeakage(ppscore_params={})
            Conditions:
                    0: Identifier columns PPS is not greater than 0
    5: IndexTrainTestLeakage
            Conditions:
                    0: Ratio of leaking indices is not greater than 0%
]

After we found the IDs we can remove the Condition:

# Access check by id
check = suite[2]
# Remove condition by id
check.remove_condition(0)

suite

Out:

Train Test Leakage Suite: [
    0: DateTrainTestLeakageDuplicates
            Conditions:
                    0: Date leakage ratio is not greater than 0%
    1: DateTrainTestLeakageOverlap
            Conditions:
                    0: Date leakage ratio is not greater than 0%
    2: SingleFeatureContributionTrainTest(ppscore_params={})
            Conditions:
                    1: Train features' Predictive Power Score is not greater than 0.7
    3: TrainTestSamplesMix
            Conditions:
                    0: Percentage of test data samples that appear in train data not greater than 10%
    4: IdentifierLeakage(ppscore_params={})
            Conditions:
                    0: Identifier columns PPS is not greater than 0
    5: IndexTrainTestLeakage
            Conditions:
                    0: Ratio of leaking indices is not greater than 0%
]

Now if we want we can also re-add the Condition using the built-in methods on the check, with a different parameter.

# Re-add the condition with new parameter
check.add_condition_feature_pps_difference_not_greater_than(0.01)

suite

Out:

Train Test Leakage Suite: [
    0: DateTrainTestLeakageDuplicates
            Conditions:
                    0: Date leakage ratio is not greater than 0%
    1: DateTrainTestLeakageOverlap
            Conditions:
                    0: Date leakage ratio is not greater than 0%
    2: SingleFeatureContributionTrainTest(ppscore_params={})
            Conditions:
                    1: Train features' Predictive Power Score is not greater than 0.7
                    2: Train-Test features' Predictive Power Score difference is not greater than 0.01
    3: TrainTestSamplesMix
            Conditions:
                    0: Percentage of test data samples that appear in train data not greater than 10%
    4: IdentifierLeakage(ppscore_params={})
            Conditions:
                    0: Identifier columns PPS is not greater than 0
    5: IndexTrainTestLeakage
            Conditions:
                    0: Ratio of leaking indices is not greater than 0%
]

Add a Custom Condition#

In order to write conditions we first have to know what value a given check produces.

Let’s look at the check DatasetsSizeComparison and see it’s return value in order to write a condition for it.

import pandas as pd

from deepchecks.tabular import Dataset
from deepchecks.tabular.checks import DatasetsSizeComparison

# We'll use dummy data for the purpose of this demonstration
train_dataset = Dataset(pd.DataFrame(data={'x': [1,2,3,4,5,6,7,8,9]}))
test_dataset = Dataset(pd.DataFrame(data={'x': [1,2,3]}))

result = DatasetsSizeComparison().run(train_dataset, test_dataset)
result.value

Out:

{'Train': 9, 'Test': 3}

Now we know what the return value looks like. Let’s add a new condition that validates that the ratio between the train and test datasets size is inside a given range. To create condition we need to use the add_condition method of the check which accepts a condition name and a function. This function receives the value of the CheckResult that we saw above and should return either a boolean or a ConditionResult containing a boolean and optional extra info that will be displayed in the Conditions Summary table.

Note: When implementing a condition in a custom check, you may want to add a method ``add_condition_x()`` to allow any consumer of your check to apply the condition (possibly with given parameters). For examples look at implemented Checks’ source code

from deepchecks.core import ConditionResult

# Our parameters for the condition
low_threshold = 0.4
high_threshold = 0.6

# Create the condition function
def custom_condition(value: dict, low=low_threshold, high=high_threshold):
    ratio = value['Test'] / value['Train']
    if low <= ratio <= high:
        return ConditionResult(ConditionCategory.PASS)
    else:
        # Note: if you doesn't care about the extra info, you can return directly a boolean
        return ConditionResult(ConditionCategory.FAIL, f'Test-Train ratio is {ratio:.2}')

# Create the condition name
condition_name = f'Test-Train ratio is between {low_threshold} to {high_threshold}'

# Create check instance with the condition
check = DatasetsSizeComparison().add_condition(condition_name, custom_condition)

Now we will use a Suite to demonstrate the action of the condition, since the suite runs the condition for us automatically and prints out a Conditions Summary table (for all the conditions defined on the checks within the suite):

from deepchecks import Suite

# Using suite to run check & condition
suite = Suite('Suite for Condition',
    check
)

suite.run(train_dataset, test_dataset)

Out:

Ability to import base tabular functionality from the `deepchecks` package directly is deprecated, please import from `deepchecks.tabular` instead

Suite for Condition:   0%| | 0/1 [00:00<?, ? Check/s]
Suite for Condition:   0%| | 0/1 [00:00<?, ? Check/s, Check=Datasets Size Comparison]

Suite for Condition

The suite is composed of the following checks: Datasets Size Comparison.
Each check may contain conditions (which will result in pass / fail / warning / error , represented by ✓ / ✖ / ! / ⁈ ) as well as other outputs such as plots or tables.
Suites, checks and conditions can all be modified. Read more about custom suites.

Conditions Summary

Status	Check	Condition	More Info
⁈	Datasets Size Comparison	Test-Train ratio is between 0.4 to 0.6	Exception in condition: NameError: name 'ConditionCategory' is not defined

Check With Conditions Output

Datasets Size Comparison

Verify test dataset size comparing it to the train dataset size.

Conditions Summary

Status	Condition	More Info
⁈	Test-Train ratio is between 0.4 to 0.6	Exception in condition: NameError: name 'ConditionCategory' is not defined

Additional Outputs

	Train	Test
Size	9	3

Go to top

Check Without Conditions Output

Go to top

Set Custom Condition Category#

When writing your own condition logic, you can decide to mark a condition result as either fail or warn, by passing the category to the ConditionResult object. For example we can even write condition which sets the category based on severity of the result:

from deepchecks.core import ConditionCategory, ConditionResult

# Our parameters for the condition
low_threshold = 0.3
high_threshold = 0.7

# Create the condition function for check `DatasetsSizeComparison`
def custom_condition(value: dict):
    ratio = value['Test'] / value['Train']
    if low_threshold <= ratio <= high_threshold:
        return ConditionResult(ConditionCategory.PASS)
    elif ratio < low_threshold:
        return ConditionResult(ConditionCategory.FAIL, f'Test-Train ratio is {ratio:.2}', ConditionCategory.FAIL)
    else:
        return ConditionResult(ConditionCategory.FAIL, f'Test-Train ratio is {ratio:.2}', ConditionCategory.WARN)

Total running time of the script: ( 0 minutes 0.031 seconds)

Gallery generated by Sphinx-Gallery

Create a Custom Suite

Exporting Results