.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "tabular/auto_checks/model_evaluation/plot_unused_features.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_tabular_auto_checks_model_evaluation_plot_unused_features.py: .. _tabular__unused_features: Unused Features *************** This notebook provides an overview for using and understanding the Unused Features check. **Structure:** * `How unused features affect my model? <#how-unused-features-affect-my-model>`__ * `Run the check <#run-the-check>`__ * `Define a condition <#define-a-condition>`__ How unused features affect my model? ===================================== Having too many features can prolong training times and degrade model performance due to "The Curse of Dimensionality" or "Hughes Phenomenon". This is because the dimensional space grows exponentially with the number of features. When the space is too large in relate to the number of data samples, it results in a very sparse distribution of the samples in the space. This sparsity also makes the samples more similar to each other, since they are all far from each other which makes it harder to find cluster together similar samples in order to find patterns. The increased dimensional space and samples similarity may require more complex models, which in turn are in greater risk of overfitting. Features with low model contribution (feature importance) are probably just noise, and should be removed as they increase the dimensionality without contributing anything. Nevertheless, models may miss important features. For that reason the Unused Features check selects out of these features those that have high variance, as they may represent information that was ignored during model construction. We may wish to manually inspect those features to make sure our model is not missing on important information. .. GENERATED FROM PYTHON SOURCE LINES 34-49 Run the check ============= The check has two key parameters (that are optional) that affect the behavior of the check and especially its output. ``feature_variance_threshold``: Controls the threshold over which features are considered "high variance". A higher threshold means that fewer features will be considered "high variance". ``feature_importance_threshold``: Controls the threshold over which features are considered important. For additional information on how feature importance is being calculated, see :ref:`tabular__feature_importance`. We will run the check on the adult dataset which can be downloaded from the `UCI machine learning repository `_ and is also available in `deepchecks.tabular.datasets`. .. GENERATED FROM PYTHON SOURCE LINES 49-59 .. code-block:: default from deepchecks.tabular.checks import UnusedFeatures from deepchecks.tabular.datasets.classification import adult _, test_ds = adult.load_data() model = adult.load_fitted_model() result = UnusedFeatures(feature_variance_threshold=1.5).run(test_ds, model) result.show() .. raw:: html
Unused Features


.. GENERATED FROM PYTHON SOURCE LINES 60-64 Define a condition ================== We can define a condition that enforces that number of unused features with high variance is not greater than a given amount, the default is 5. .. GENERATED FROM PYTHON SOURCE LINES 64-67 .. code-block:: default check = UnusedFeatures().add_condition_number_of_high_variance_unused_features_less_or_equal(5) result = check.run(test_ds, model) result.show(show_additional_outputs=False) .. raw:: html
Unused Features


.. rst-class:: sphx-glr-timing **Total running time of the script:** (1 minutes 14.295 seconds) .. _sphx_glr_download_tabular_auto_checks_model_evaluation_plot_unused_features.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_unused_features.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_unused_features.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_