.. _feature_importance: ==================== Feature Importance ==================== * `What is Feature Importance? <#what-is-feature-importance>`__ * `Why Does Deepchecks Use Feature Importance? <#why-does-deepchecks-use-feature-importance>`__ * `What if the Feature Importance Calculation Takes Too Long? <#what-if-the-feature-importance-calculation-takes-too-long>`__ What is Feature Importance? =========================== Feature importance is a ranking that represents the significance of input features to the model's predictions. A feature with higher importance has more influence on the prediction of the model. Feature importance can be general (meaning, for all model predictions, on average) or local (meaning, for a specific sample). There are many ways to calculate feature importance, some are generic for all models (such as `Shapley values `_) and some are specific for a specific model type (such as the `Gini importance `_ for decision trees). Why Does Deepchecks Use Feature Importance? =========================================== Deepchecks uses your model's feature importance for 2 main reasons: * Help you find issues with your model or data, as in the check :doc:`UnusedFeatures ` * Display the most relevant information the check has found (for instance, if deepchecks found drift in many features, as in the check :doc:`TrainTestFeatureDrift `, it would only display the features with the highest importance) Note ---- For most checks, deepchecks does not *require* the usage of feature importance, and that you can shorten or `even skip <#what-if-the-feature-importance-calculation-takes-too-long>`__ this phase of the calculation. How Does Deepchecks Get Feature Importance? =========================================== There are 3 ways in which deepchecks can get your model's feature importance: Your Model Has a Built-in Feature Importance -------------------------------------------- First of all, deepchecks searches for your model's built-in feature importance, as some scikit-learn models have. Deepchecks looks for the attribute ``feature_importances_`` or ``coef_`` and uses that information if it exists. You Insert Your Own Feature Importance Data ------------------------------------------- This can be done by using the ``features_importance`` parameter in the ``run`` function, available in all checks and suites. Deepchecks expects this data to be a ``pandas.Series`` where the index is feature names and the value is the calculated importance. >>> check = UnusedFeatures() >>> check.run(ds_train, ds_test, model, feature_importance=pd.Series({'feat1': 0.3, 'feat2': 0.7})) Deepchecks Calculates the Feature Importance for You ---------------------------------------------------- If there's no built-in feature importance in the model or the user has not supplied feature importance data of their own, deepchecks will calculate feature importance using scikit-learn's `permutation_importance `_. You can also force this action by using the ``feature_importance_force_permutation`` parameter in the ``run`` function, available in all checks and suites. >>> check = TrainTestFeatureDrift() >>> check.run(ds_train, ds_test, model, feature_importance_force_permutation=True) What if the Feature Importance Calculation Takes Too Long? ========================================================= Feature importance is a complex calculation which can take a lot of time, depending on the number of features and samples in your data. However, besides for cetrain checks, deepchecks does not require feature importance. Therefore, if you want deepchecks to skip the calculation of feature importance, you can use the ``feature_importance_timeout`` parameter in the ``run`` function, available in all checks and suites. If deepchecks projects the feature importance calculation to take more than this parameter, the process will be skipped. Configuring this parameter to 0 will ensure the calculation is always skipped. >>> check = WholeDatasetDrift() >>> check.run(ds_train, ds_test, model, feature_importance_timeout=0)