Permuted Gini Importance – PaP Impurity Measurement for Tree-Based Models
Keywords:
Feature Selection, Gini Impurity, Feature Importance, Impurity MeasurementAbstract
One of the various advantages of tree based models is that they come with feature importance measures intended for feature ranking. The significant advantage of tree ensemble importance measures is that they ensure the impact of each and every predictor variable distinctly including multivariate interactions with other predictor variables. But when correlation increases, both Gini and Permutation importance are incapable to detect relevant variables. Also, Gini importance is biased towards certain features. To reduce the feature selection bias for randomized trees, our proposed permuted Gini importance approach permutes the features thereby destroying their extrapolative influence without changing their marginal distribution. Permuting one of the features leads to model extrapolation. In tree ensembles the extrapolation quality is generally low, therefore, error value is high that forces Permuted Gini impurity to put weight on its importance. To reduce the feature selection bias for randomized trees, our proposed approach achieves higher computational efficiency. Jupyter notebook was employed as a primary platform to conduct the analysis and experimentation.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Journal of Soft Computing and Data Mining

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.









