.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "general/usage/customizations/auto_examples/plot_create_a_custom_suite.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_general_usage_customizations_auto_examples_plot_create_a_custom_suite.py: .. _create_custom_suite: Create a Custom Suite ********************* A suite is a list of checks that will run one after the other, and its results will be displayed together. To customize a suite, we can either: * `Create new custom suites <#create-a-new-suite>`__, by choosing the checks (and the optional conditions) that we want the suite to have. * `Modify a built-in suite <#modify-an-existing-suite>`__ by adding and/or removing checks and conditions, to adapt it to our needs. Create a New Suite ================== Let's say we want to create our custom suite, mainly with various performance checks, including ``PerformanceReport(), TrainTestDifferenceOverfit()`` and several more. For assistance in understanding which checks are implemented and can be included, we suggest using any of: * :doc:`API Reference ` * :ref:`Tabular checks ` * :ref:`Vision checks ` * :ref:`NLP checks ` * Built-in suites (by printing them to see which checks they include) .. GENERATED FROM PYTHON SOURCE LINES 35-61 .. code-block:: default from sklearn.metrics import make_scorer, precision_score, recall_score from deepchecks.tabular import Suite # importing all existing checks for demonstration simplicity from deepchecks.tabular.checks import * # The Suite's first argument is its name, and then all of the check objects. # Some checks can receive arguments when initialized (all check arguments have default values) # Each check can have an optional condition(/s) # Multiple conditions can be applied subsequentially new_custom_suite = Suite('Simple Suite For Model Performance', ModelInfo(), # use custom scorers for performance report: TrainTestPerformance().add_condition_train_test_relative_degradation_less_than(threshold=0.15)\ .add_condition_test_performance_greater_than(0.8), ConfusionMatrixReport(), SimpleModelComparison(strategy='most_frequent', scorers={'Recall (Multiclass)': make_scorer(recall_score, average=None), 'Precision (Multiclass)': make_scorer(precision_score, average=None)} ).add_condition_gain_greater_than(0.3) ) # The scorers' parameter can also be passed to the suite in order to override the scorers of all the checks # in the suite. See :ref:`metrics_user_guide` for further details. .. GENERATED FROM PYTHON SOURCE LINES 62-63 Let's see the suite: .. GENERATED FROM PYTHON SOURCE LINES 63-65 .. code-block:: default new_custom_suite .. rst-class:: sphx-glr-script-out .. code-block:: none Simple Suite For Model Performance: [ 0: ModelInfo 1: TrainTestPerformance Conditions: 0: Train-Test scores relative degradation is less than 0.15 1: Scores are greater than 0.8 2: ConfusionMatrixReport 3: SimpleModelComparison(alternative_scorers={'Recall (Multiclass)': make_scorer(recall_score, average=None), 'Precision (Multiclass)': make_scorer(precision_score, average=None)}) Conditions: 0: Model performance gain over simple model is greater than 30% ] .. GENERATED FROM PYTHON SOURCE LINES 66-99 *TIP: the auto-complete may not work from inside a new suite definition, so if you want to use the auto-complete to see the arguments a check receive or the built-in conditions it has, try doing it outside of the suite's initialization.* * For example, to see a check's built-in conditions, type in a new cell: ``NameOfDesiredCheck().add_condition_`` and then check the auto-complete suggestions (using Shift + Tab), to discover the built-in checks.* Additional Notes about Conditions in a Suite -------------------------------------------- * Checks in the built-in suites come with pre-defined conditions, and when building your custom suite you should choose which conditions to add. * Most check classes have built-in methods for adding conditions. These apply to the naming convention ``add_condition_...``, which enables adding a condition logic to parse the check's results. * Each check instance can have several conditions or none. Each condition will be evaluated separately. * The pass (✓) / fail (✖) / insight (!) status of the conditions, along with the condition's name and extra info will be displayed in the suite's Conditions Summary. * Most conditions have configurable arguments that can be passed to the condition while adding it. * For more info about conditions, check out :doc:`Configure a Condition `. Run the Suite ============= This is simply done by calling the ``run()`` method of the suite. To see that in action, we'll need datasets and a model. Let's quickly load a dataset and train a simple model for the sake of this demo Load Datasets and Train a Simple Model -------------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 99-120 .. code-block:: default import numpy as np # General imports import pandas as pd np.random.seed(22) from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split from deepchecks.tabular.datasets.classification import iris # Load pre-split Datasets train_dataset, test_dataset = iris.load_data(as_train_test=True) label_col = 'target' # Train Model rf_clf = RandomForestClassifier() rf_clf.fit(train_dataset.data[train_dataset.features], train_dataset.data[train_dataset.label_name]); .. rst-class:: sphx-glr-script-out .. code-block:: none RandomForestClassifier() .. GENERATED FROM PYTHON SOURCE LINES 121-123 Run Suite --------- .. GENERATED FROM PYTHON SOURCE LINES 123-126 .. code-block:: default new_custom_suite.run(model=rf_clf, train_dataset=train_dataset, test_dataset=test_dataset) .. rst-class:: sphx-glr-script-out .. code-block:: none Simple Suite For Model Performance: | | 0/4 [Time: 00:00] Simple Suite For Model Performance: |██▌ | 2/4 [Time: 00:00, Check=Train Test Performance]/home/runner/work/deepchecks/deepchecks/venv/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior. Simple Suite For Model Performance: |█████| 4/4 [Time: 00:00, Check=Simple Model Comparison] .. raw:: html
Simple Suite For Model Performance


.. GENERATED FROM PYTHON SOURCE LINES 127-129 Modify an Existing Suite ======================== .. GENERATED FROM PYTHON SOURCE LINES 129-137 .. code-block:: default from deepchecks.tabular.suites import train_test_validation customized_suite = train_test_validation() # let's check what it has: customized_suite .. rst-class:: sphx-glr-script-out .. code-block:: none Train Test Validation Suite: [ 0: DatasetsSizeComparison Conditions: 0: Test-Train size ratio is greater than 0.01 1: NewLabelTrainTest Conditions: 0: Number of new label values is less or equal to 0 2: NewCategoryTrainTest Conditions: 0: Ratio of samples with a new category is less or equal to 0% 3: StringMismatchComparison Conditions: 0: No new variants allowed in test data 4: DateTrainTestLeakageDuplicates Conditions: 0: Date leakage ratio is less or equal to 0% 5: DateTrainTestLeakageOverlap Conditions: 0: Date leakage ratio is less or equal to 0% 6: IndexTrainTestLeakage Conditions: 0: Ratio of leaking indices is less or equal to 0% 7: TrainTestSamplesMix(n_to_show=5) Conditions: 0: Percentage of test data samples that appear in train data is less or equal to 5% 8: FeatureLabelCorrelationChange(ppscore_params={}, random_state=42) Conditions: 0: Train-Test features' Predictive Power Score difference is less than 0.2 1: Train features' Predictive Power Score is less than 0.7 9: FeatureDrift Conditions: 0: categorical drift score < 0.2 and numerical drift score < 0.2 10: LabelDrift Conditions: 0: Label drift score < 0.15 11: MultivariateDrift Conditions: 0: Drift value is less than 0.25 ] .. GENERATED FROM PYTHON SOURCE LINES 138-142 .. code-block:: default # and modify it by removing a check by index: customized_suite.remove(1) .. rst-class:: sphx-glr-script-out .. code-block:: none Train Test Validation Suite: [ 0: DatasetsSizeComparison Conditions: 0: Test-Train size ratio is greater than 0.01 2: NewCategoryTrainTest Conditions: 0: Ratio of samples with a new category is less or equal to 0% 3: StringMismatchComparison Conditions: 0: No new variants allowed in test data 4: DateTrainTestLeakageDuplicates Conditions: 0: Date leakage ratio is less or equal to 0% 5: DateTrainTestLeakageOverlap Conditions: 0: Date leakage ratio is less or equal to 0% 6: IndexTrainTestLeakage Conditions: 0: Ratio of leaking indices is less or equal to 0% 7: TrainTestSamplesMix(n_to_show=5) Conditions: 0: Percentage of test data samples that appear in train data is less or equal to 5% 8: FeatureLabelCorrelationChange(ppscore_params={}, random_state=42) Conditions: 0: Train-Test features' Predictive Power Score difference is less than 0.2 1: Train features' Predictive Power Score is less than 0.7 9: FeatureDrift Conditions: 0: categorical drift score < 0.2 and numerical drift score < 0.2 10: LabelDrift Conditions: 0: Label drift score < 0.15 11: MultivariateDrift Conditions: 0: Drift value is less than 0.25 ] .. GENERATED FROM PYTHON SOURCE LINES 143-150 .. code-block:: default from deepchecks.tabular.checks import UnusedFeatures # and add a new check with a condition: customized_suite.add( UnusedFeatures().add_condition_number_of_high_variance_unused_features_less_or_equal()) .. rst-class:: sphx-glr-script-out .. code-block:: none Train Test Validation Suite: [ 0: DatasetsSizeComparison Conditions: 0: Test-Train size ratio is greater than 0.01 2: NewCategoryTrainTest Conditions: 0: Ratio of samples with a new category is less or equal to 0% 3: StringMismatchComparison Conditions: 0: No new variants allowed in test data 4: DateTrainTestLeakageDuplicates Conditions: 0: Date leakage ratio is less or equal to 0% 5: DateTrainTestLeakageOverlap Conditions: 0: Date leakage ratio is less or equal to 0% 6: IndexTrainTestLeakage Conditions: 0: Ratio of leaking indices is less or equal to 0% 7: TrainTestSamplesMix(n_to_show=5) Conditions: 0: Percentage of test data samples that appear in train data is less or equal to 5% 8: FeatureLabelCorrelationChange(ppscore_params={}, random_state=42) Conditions: 0: Train-Test features' Predictive Power Score difference is less than 0.2 1: Train features' Predictive Power Score is less than 0.7 9: FeatureDrift Conditions: 0: categorical drift score < 0.2 and numerical drift score < 0.2 10: LabelDrift Conditions: 0: Label drift score < 0.15 11: MultivariateDrift Conditions: 0: Drift value is less than 0.25 12: UnusedFeatures Conditions: 0: Number of high variance unused features is less or equal to 5 ] .. GENERATED FROM PYTHON SOURCE LINES 151-158 .. code-block:: default # lets remove all condition for the FeatureLabelCorrelationChange: customized_suite[3].clean_conditions() # and update the suite's name: customized_suite.name = 'New Data Leakage Suite' .. GENERATED FROM PYTHON SOURCE LINES 159-162 .. code-block:: default # and now we can run our modified suite: customized_suite.run(train_dataset, test_dataset, rf_clf) .. rst-class:: sphx-glr-script-out .. code-block:: none New Data Leakage Suite: | | 0/12 [Time: 00:00] New Data Leakage Suite: |████████ | 8/12 [Time: 00:00, Check=Feature Label Correlation Change] .. raw:: html
New Data Leakage Suite


.. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 4.229 seconds) .. _sphx_glr_download_general_usage_customizations_auto_examples_plot_create_a_custom_suite.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_create_a_custom_suite.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_create_a_custom_suite.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_