.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "user-guide/general/customizations/examples/plot_create_a_custom_suite.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_user-guide_general_customizations_examples_plot_create_a_custom_suite.py: Create a Custom Suite ********************* A suite is a list of checks that will run one after the other, and its results will be displayed together. To customize a suite, we can either: * `Create new custom suites <#create-a-new-suite>`__, by choosing the checks (and the optional conditions) that we want the suite to have. * `Modify a built-in suite <#modify-an-existing-suite>`__ by adding and/or removing checks and conditions, to adapt it to our needs. Create a New Suite ================== Let's say we want to create our custom suite, mainly with various performance checks, including ``PerformanceReport(), TrainTestDifferenceOverfit()`` and several more. For assistance in understanding which checks are implemented and can be included, we suggest using any of: * :doc:`API Reference ` * `Tabular checks demonstration notebooks `__ * `Computer vision checks demonstration notebooks `__ * Built-in suites (by printing them to see which checks they include) .. GENERATED FROM PYTHON SOURCE LINES 31-56 .. code-block:: default from sklearn.metrics import make_scorer, precision_score, recall_score from deepchecks.tabular import Suite # importing all existing checks for demonstration simplicity from deepchecks.tabular.checks import * # The Suite's first argument is its name, and then all of the check objects. # Some checks can receive arguments when initialized (all check arguments have default values) # Each check can have an optional condition(/s) # Multiple conditions can be applied subsequentially new_custom_suite = Suite('Simple Suite For Model Performance', ModelInfo(), # use custom scorers for performance report: TrainTestPerformance().add_condition_train_test_relative_degradation_less_than(threshold=0.15 \ ).add_condition_test_performance_greater_than(0.8), ConfusionMatrixReport(), SimpleModelComparison(strategy='most_frequent', \ alternative_scorers={'Recall (Multiclass)': make_scorer(recall_score, average=None), \ 'Precision (Multiclass)': make_scorer(precision_score, average=None)} \ ).add_condition_gain_greater_than(0.3) ) # Let's see the suite: new_custom_suite .. rst-class:: sphx-glr-script-out .. code-block:: none Simple Suite For Model Performance: [ 0: ModelInfo 1: TrainTestPerformance Conditions: 0: Train-Test scores relative degradation is less than 0.15 1: Scores are greater than 0.8 2: ConfusionMatrixReport 3: SimpleModelComparison Conditions: 0: Model performance gain over simple model is greater than 30% ] .. GENERATED FROM PYTHON SOURCE LINES 57-90 *TIP: the auto-complete may not work from inside a new suite definition, so if you want to use the auto-complete to see the arguments a check receive or the built-in conditions it has, try doing it outside of the suite's initialization.* *For example, to see a check's built-in conditions, type in a new cell: ``NameOfDesiredCheck().add_condition_`` and then check the auto-complete suggestions (using Shift + Tab), to discover the built-in checks.* Additional Notes about Conditions in a Suite -------------------------------------------- * Checks in the built-in suites come with pre-defined conditions, and when building your custom suite you should choose which conditions to add. * Most check classes have built-in methods for adding conditions. These apply to the naming convention ``add_condition_...``, which enables adding a condition logic to parse the check's results. * Each check instance can have several conditions or none. Each condition will be evaluated separately. * The pass (✓) / fail (✖) / insight (!) status of the conditions, along with the condition's name and extra info will be displayed in the suite's Conditions Summary. * Most conditions have configurable arguments that can be passed to the condition while adding it. * For more info about conditions, check out :doc:`Configure a Condition `. Run the Suite ============= This is simply done by calling the ``run()`` method of the suite. To see that in action, we'll need datasets and a model. Let's quickly load a dataset and train a simple model for the sake of this demo Load Datasets and Train a Simple Model -------------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 90-111 .. code-block:: default import numpy as np # General imports import pandas as pd np.random.seed(22) from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split from deepchecks.tabular.datasets.classification import iris # Load pre-split Datasets train_dataset, test_dataset = iris.load_data(as_train_test=True) label_col = 'target' # Train Model rf_clf = RandomForestClassifier() rf_clf.fit(train_dataset.data[train_dataset.features], train_dataset.data[train_dataset.label_name]); .. rst-class:: sphx-glr-script-out .. code-block:: none RandomForestClassifier() .. GENERATED FROM PYTHON SOURCE LINES 112-114 Run Suite --------- .. GENERATED FROM PYTHON SOURCE LINES 114-117 .. code-block:: default new_custom_suite.run(model=rf_clf, train_dataset=train_dataset, test_dataset=test_dataset) .. rst-class:: sphx-glr-script-out .. code-block:: none Simple Suite For Model Performance: | | 0/4 [Time: 00:00] Simple Suite For Model Performance: |##5 | 2/4 [Time: 00:00, Check=Train Test Performance]Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior. Simple Suite For Model Performance: |#####| 4/4 [Time: 00:00, Check=Simple Model Comparison] .. raw:: html
Simple Suite For Model Performance


.. GENERATED FROM PYTHON SOURCE LINES 118-120 Modify an Existing Suite ======================== .. GENERATED FROM PYTHON SOURCE LINES 120-128 .. code-block:: default from deepchecks.tabular.suites import train_test_validation customized_suite = train_test_validation() # let's check what it has: customized_suite .. rst-class:: sphx-glr-script-out .. code-block:: none Train Test Validation Suite: [ 0: DatasetsSizeComparison Conditions: 0: Test-Train size ratio is greater than 0.01 1: NewLabelTrainTest Conditions: 0: Number of new label values is less or equal to 0 2: CategoryMismatchTrainTest Conditions: 0: Ratio of samples with a new category is less or equal to 0% 3: StringMismatchComparison Conditions: 0: No new variants allowed in test data 4: DateTrainTestLeakageDuplicates Conditions: 0: Date leakage ratio is less or equal to 0% 5: DateTrainTestLeakageOverlap Conditions: 0: Date leakage ratio is less or equal to 0% 6: IndexTrainTestLeakage Conditions: 0: Ratio of leaking indices is less or equal to 0% 7: TrainTestSamplesMix Conditions: 0: Percentage of test data samples that appear in train data is less or equal to 10% 8: FeatureLabelCorrelationChange(ppscore_params={}, random_state=42) Conditions: 0: Train-Test features' Predictive Power Score difference is less than 0.2 1: Train features' Predictive Power Score is less than 0.7 9: TrainTestFeatureDrift Conditions: 0: categorical drift score < 0.2 and numerical drift score < 0.1 10: TrainTestLabelDrift Conditions: 0: categorical drift score < 0.2 and numerical drift score < 0.1 for label drift 11: WholeDatasetDrift Conditions: 0: Drift value is less than 0.25 ] .. GENERATED FROM PYTHON SOURCE LINES 129-133 .. code-block:: default # and modify it by removing a check by index: customized_suite.remove(1) .. rst-class:: sphx-glr-script-out .. code-block:: none Train Test Validation Suite: [ 0: DatasetsSizeComparison Conditions: 0: Test-Train size ratio is greater than 0.01 2: CategoryMismatchTrainTest Conditions: 0: Ratio of samples with a new category is less or equal to 0% 3: StringMismatchComparison Conditions: 0: No new variants allowed in test data 4: DateTrainTestLeakageDuplicates Conditions: 0: Date leakage ratio is less or equal to 0% 5: DateTrainTestLeakageOverlap Conditions: 0: Date leakage ratio is less or equal to 0% 6: IndexTrainTestLeakage Conditions: 0: Ratio of leaking indices is less or equal to 0% 7: TrainTestSamplesMix Conditions: 0: Percentage of test data samples that appear in train data is less or equal to 10% 8: FeatureLabelCorrelationChange(ppscore_params={}, random_state=42) Conditions: 0: Train-Test features' Predictive Power Score difference is less than 0.2 1: Train features' Predictive Power Score is less than 0.7 9: TrainTestFeatureDrift Conditions: 0: categorical drift score < 0.2 and numerical drift score < 0.1 10: TrainTestLabelDrift Conditions: 0: categorical drift score < 0.2 and numerical drift score < 0.1 for label drift 11: WholeDatasetDrift Conditions: 0: Drift value is less than 0.25 ] .. GENERATED FROM PYTHON SOURCE LINES 134-141 .. code-block:: default from deepchecks.tabular.checks import UnusedFeatures # and add a new check with a condition: customized_suite.add( UnusedFeatures().add_condition_number_of_high_variance_unused_features_less_or_equal()) .. rst-class:: sphx-glr-script-out .. code-block:: none Train Test Validation Suite: [ 0: DatasetsSizeComparison Conditions: 0: Test-Train size ratio is greater than 0.01 2: CategoryMismatchTrainTest Conditions: 0: Ratio of samples with a new category is less or equal to 0% 3: StringMismatchComparison Conditions: 0: No new variants allowed in test data 4: DateTrainTestLeakageDuplicates Conditions: 0: Date leakage ratio is less or equal to 0% 5: DateTrainTestLeakageOverlap Conditions: 0: Date leakage ratio is less or equal to 0% 6: IndexTrainTestLeakage Conditions: 0: Ratio of leaking indices is less or equal to 0% 7: TrainTestSamplesMix Conditions: 0: Percentage of test data samples that appear in train data is less or equal to 10% 8: FeatureLabelCorrelationChange(ppscore_params={}, random_state=42) Conditions: 0: Train-Test features' Predictive Power Score difference is less than 0.2 1: Train features' Predictive Power Score is less than 0.7 9: TrainTestFeatureDrift Conditions: 0: categorical drift score < 0.2 and numerical drift score < 0.1 10: TrainTestLabelDrift Conditions: 0: categorical drift score < 0.2 and numerical drift score < 0.1 for label drift 11: WholeDatasetDrift Conditions: 0: Drift value is less than 0.25 12: UnusedFeatures Conditions: 0: Number of high variance unused features is less or equal to 5 ] .. GENERATED FROM PYTHON SOURCE LINES 142-149 .. code-block:: default # lets remove all condition for the FeatureLabelCorrelationChange: customized_suite[3].clean_conditions() # and update the suite's name: customized_suite.name = 'New Data Leakage Suite' .. GENERATED FROM PYTHON SOURCE LINES 150-153 .. code-block:: default # and now we can run our modified suite: customized_suite.run(train_dataset, test_dataset, rf_clf) .. rst-class:: sphx-glr-script-out .. code-block:: none New Data Leakage Suite: | | 0/12 [Time: 00:00] New Data Leakage Suite: |######### | 9/12 [Time: 00:00, Check=Train Test Feature Drift] .. raw:: html
New Data Leakage Suite


.. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 3.415 seconds) .. _sphx_glr_download_user-guide_general_customizations_examples_plot_create_a_custom_suite.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_create_a_custom_suite.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_create_a_custom_suite.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_