.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "user-guide/general/customizations/examples/plot_configure_check_conditions.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_user-guide_general_customizations_examples_plot_configure_check_conditions.py: Configure Check Conditions ************************** The following guide includes different options for configuring a check's condition(s): * `Add Condition <#add-condition>`__ * `Remove / Edit a Condition <#remove-edit-a-condition>`__ * `Add a Custom Condition <#add-a-custom-condition>`__ * `Set Custom Condition Category <#set-custom-condition-category>`__ Add Condition ============= In order to add a condition to an existing check, we can use any of the pre-defined conditions for that check. The naming convention for the methods that add the condition is ``add_condition_...``. If you want to create and add your custom condition logic for parsing the check's result value, see `Add a Custom Condition <#add-a-custom-condition>`__. .. GENERATED FROM PYTHON SOURCE LINES 24-26 Add a condition to a new check ------------------------------ .. GENERATED FROM PYTHON SOURCE LINES 26-32 .. code-block:: default from deepchecks.tabular.checks import DatasetsSizeComparison check = DatasetsSizeComparison().add_condition_test_size_greater_or_equal(1000) check .. rst-class:: sphx-glr-script-out .. code-block:: none DatasetsSizeComparison Conditions: 0: Test dataset size is greater or equal to 1000 .. GENERATED FROM PYTHON SOURCE LINES 33-37 Conditions are used mainly in the context of a Suite, and displayed in the Conditions Summary table. For example how to run in a suite you can look at `Add a Custom Condition <#add-a-custom-condition>`__ or if you would like to run the conditions outside of suite you can execute: .. GENERATED FROM PYTHON SOURCE LINES 37-49 .. code-block:: default import pandas as pd from deepchecks.tabular import Dataset # Dummy data train_dataset = Dataset(pd.DataFrame(data={'x': [1,2,3,4,5,6,7,8,9]})) test_dataset = Dataset(pd.DataFrame(data={'x': [1,2,3]})) condition_results = check.conditions_decision(check.run(train_dataset, test_dataset)) condition_results .. rst-class:: sphx-glr-script-out .. code-block:: none [{'details': 'Test dataset contains 3 samples', 'category': , 'name': 'Test dataset size is greater or equal to 1000'}] .. GENERATED FROM PYTHON SOURCE LINES 50-60 Add a condition to a check in a suite ------------------------------------- If we want to add a conditon to a check within an existing suite, we should first find the Check's ID within the suite, and then add the condition to it, by running the relevant ``add_condition_`` method on that check's instance. See the next section to understand how to do so. The condition will then be appended to the list of conditions on that check (or be the first one if no conditions are defined), and each condition will be evaluated separately when running the suite. .. GENERATED FROM PYTHON SOURCE LINES 62-70 Remove / Edit a Condition ========================= Deepchecks provides different kinds of default suites, which come with pre-defined conditions. You may want to remove a condition in case it isn't needed for you, or you may want to change the condition's parameters (since conditions functions are immutable). To remove a condition, start by printing the Suite and identifing the Check's ID, and the Condition's ID: .. GENERATED FROM PYTHON SOURCE LINES 70-76 .. code-block:: default from deepchecks.tabular.suites import train_test_validation suite = train_test_validation() suite .. rst-class:: sphx-glr-script-out .. code-block:: none Train Test Validation Suite: [ 0: DatasetsSizeComparison Conditions: 0: Test-Train size ratio is greater than 0.01 1: NewLabelTrainTest Conditions: 0: Number of new label values is less or equal to 0 2: CategoryMismatchTrainTest Conditions: 0: Ratio of samples with a new category is less or equal to 0% 3: StringMismatchComparison Conditions: 0: No new variants allowed in test data 4: DateTrainTestLeakageDuplicates Conditions: 0: Date leakage ratio is less or equal to 0% 5: DateTrainTestLeakageOverlap Conditions: 0: Date leakage ratio is less or equal to 0% 6: IndexTrainTestLeakage Conditions: 0: Ratio of leaking indices is less or equal to 0% 7: TrainTestSamplesMix Conditions: 0: Percentage of test data samples that appear in train data is less or equal to 10% 8: FeatureLabelCorrelationChange(ppscore_params={}, random_state=42) Conditions: 0: Train-Test features' Predictive Power Score difference is less than 0.2 1: Train features' Predictive Power Score is less than 0.7 9: TrainTestFeatureDrift Conditions: 0: categorical drift score < 0.2 and numerical drift score < 0.1 10: TrainTestLabelDrift Conditions: 0: categorical drift score < 0.2 and numerical drift score < 0.1 for label drift 11: WholeDatasetDrift Conditions: 0: Drift value is less than 0.25 ] .. GENERATED FROM PYTHON SOURCE LINES 77-78 After we found the IDs we can remove the Condition: .. GENERATED FROM PYTHON SOURCE LINES 78-86 .. code-block:: default # Access check by id check = suite[8] # Remove condition by id check.remove_condition(0) suite .. rst-class:: sphx-glr-script-out .. code-block:: none Train Test Validation Suite: [ 0: DatasetsSizeComparison Conditions: 0: Test-Train size ratio is greater than 0.01 1: NewLabelTrainTest Conditions: 0: Number of new label values is less or equal to 0 2: CategoryMismatchTrainTest Conditions: 0: Ratio of samples with a new category is less or equal to 0% 3: StringMismatchComparison Conditions: 0: No new variants allowed in test data 4: DateTrainTestLeakageDuplicates Conditions: 0: Date leakage ratio is less or equal to 0% 5: DateTrainTestLeakageOverlap Conditions: 0: Date leakage ratio is less or equal to 0% 6: IndexTrainTestLeakage Conditions: 0: Ratio of leaking indices is less or equal to 0% 7: TrainTestSamplesMix Conditions: 0: Percentage of test data samples that appear in train data is less or equal to 10% 8: FeatureLabelCorrelationChange(ppscore_params={}, random_state=42) Conditions: 1: Train features' Predictive Power Score is less than 0.7 9: TrainTestFeatureDrift Conditions: 0: categorical drift score < 0.2 and numerical drift score < 0.1 10: TrainTestLabelDrift Conditions: 0: categorical drift score < 0.2 and numerical drift score < 0.1 for label drift 11: WholeDatasetDrift Conditions: 0: Drift value is less than 0.25 ] .. GENERATED FROM PYTHON SOURCE LINES 87-89 Now if we want we can also re-add the Condition using the built-in methods on the check, with a different parameter. .. GENERATED FROM PYTHON SOURCE LINES 89-95 .. code-block:: default # Re-add the condition with new parameter check.add_condition_feature_pps_difference_less_than(0.01) suite .. rst-class:: sphx-glr-script-out .. code-block:: none Train Test Validation Suite: [ 0: DatasetsSizeComparison Conditions: 0: Test-Train size ratio is greater than 0.01 1: NewLabelTrainTest Conditions: 0: Number of new label values is less or equal to 0 2: CategoryMismatchTrainTest Conditions: 0: Ratio of samples with a new category is less or equal to 0% 3: StringMismatchComparison Conditions: 0: No new variants allowed in test data 4: DateTrainTestLeakageDuplicates Conditions: 0: Date leakage ratio is less or equal to 0% 5: DateTrainTestLeakageOverlap Conditions: 0: Date leakage ratio is less or equal to 0% 6: IndexTrainTestLeakage Conditions: 0: Ratio of leaking indices is less or equal to 0% 7: TrainTestSamplesMix Conditions: 0: Percentage of test data samples that appear in train data is less or equal to 10% 8: FeatureLabelCorrelationChange(ppscore_params={}, random_state=42) Conditions: 1: Train features' Predictive Power Score is less than 0.7 2: Train-Test features' Predictive Power Score difference is less than 0.01 9: TrainTestFeatureDrift Conditions: 0: categorical drift score < 0.2 and numerical drift score < 0.1 10: TrainTestLabelDrift Conditions: 0: categorical drift score < 0.2 and numerical drift score < 0.1 for label drift 11: WholeDatasetDrift Conditions: 0: Drift value is less than 0.25 ] .. GENERATED FROM PYTHON SOURCE LINES 96-102 Add a Custom Condition ====================== In order to write conditions we first have to know what value a given check produces. Let's look at the check ``DatasetsSizeComparison`` and see it's return value in order to write a condition for it. .. GENERATED FROM PYTHON SOURCE LINES 102-115 .. code-block:: default import pandas as pd from deepchecks.tabular import Dataset from deepchecks.tabular.checks import DatasetsSizeComparison # We'll use dummy data for the purpose of this demonstration train_dataset = Dataset(pd.DataFrame(data={'x': [1,2,3,4,5,6,7,8,9]})) test_dataset = Dataset(pd.DataFrame(data={'x': [1,2,3]})) result = DatasetsSizeComparison().run(train_dataset, test_dataset) result.value .. rst-class:: sphx-glr-script-out .. code-block:: none {'Train': 9, 'Test': 3} .. GENERATED FROM PYTHON SOURCE LINES 116-126 Now we know what the return value looks like. Let's add a new condition that validates that the ratio between the train and test datasets size is inside a given range. To create condition we need to use the ``add_condition`` method of the check which accepts a condition name and a function. This function receives the value of the ``CheckResult`` that we saw above and should return either a boolean or a ``ConditionResult`` containing a boolean and optional extra info that will be displayed in the Conditions Summary table. *Note: When implementing a condition in a custom check, you may want to add a method ``add_condition_x()`` to allow any consumer of your check to apply the condition (possibly with given parameters). For examples look at implemented Checks' source code* .. GENERATED FROM PYTHON SOURCE LINES 126-148 .. code-block:: default from deepchecks.core import ConditionResult # Our parameters for the condition low_threshold = 0.4 high_threshold = 0.6 # Create the condition function def custom_condition(value: dict, low=low_threshold, high=high_threshold): ratio = value['Test'] / value['Train'] if low <= ratio <= high: return ConditionResult(ConditionCategory.PASS) else: # Note: if you doesn't care about the extra info, you can return directly a boolean return ConditionResult(ConditionCategory.FAIL, f'Test-Train ratio is {ratio:.2}') # Create the condition name condition_name = f'Test-Train ratio is between {low_threshold} to {high_threshold}' # Create check instance with the condition check = DatasetsSizeComparison().add_condition(condition_name, custom_condition) .. GENERATED FROM PYTHON SOURCE LINES 149-152 Now we will use a Suite to demonstrate the action of the condition, since the suite runs the condition for us automatically and prints out a Conditions Summary table (for all the conditions defined on the checks within the suite): .. GENERATED FROM PYTHON SOURCE LINES 152-162 .. code-block:: default from deepchecks.tabular import Suite # Using suite to run check & condition suite = Suite('Suite for Condition', check ) suite.run(train_dataset, test_dataset) .. rst-class:: sphx-glr-script-out .. code-block:: none Suite for Condition: | | 0/1 [Time: 00:00] .. raw:: html
Suite for Condition


.. GENERATED FROM PYTHON SOURCE LINES 163-168 Set Custom Condition Category ============================= When writing your own condition logic, you can decide to mark a condition result as either fail or warn, by passing the category to the ConditionResult object. For example we can even write condition which sets the category based on severity of the result: .. GENERATED FROM PYTHON SOURCE LINES 168-184 .. code-block:: default from deepchecks.core import ConditionCategory, ConditionResult # Our parameters for the condition low_threshold = 0.3 high_threshold = 0.7 # Create the condition function for check `DatasetsSizeComparison` def custom_condition(value: dict): ratio = value['Test'] / value['Train'] if low_threshold <= ratio <= high_threshold: return ConditionResult(ConditionCategory.PASS) elif ratio < low_threshold: return ConditionResult(ConditionCategory.FAIL, f'Test-Train ratio is {ratio:.2}', ConditionCategory.FAIL) else: return ConditionResult(ConditionCategory.FAIL, f'Test-Train ratio is {ratio:.2}', ConditionCategory.WARN) .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.125 seconds) .. _sphx_glr_download_user-guide_general_customizations_examples_plot_configure_check_conditions.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_configure_check_conditions.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_configure_check_conditions.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_