Permuted Gini Importance – PaP Impurity Measurement for Tree-Based Models

Ifra Altaf; Manzoor Ahmed Chachoo

Authors

Ifra Altaf PG Department of Computer Sciences, University of Kashmir
Manzoor Ahmed Chachoo University of Kashmir, Srinagar

Keywords:

Feature Selection, Gini Impurity, Feature Importance, Impurity Measurement

Abstract

One of the various advantages of tree based models is that they come with feature importance measures intended for feature ranking. The significant advantage of tree ensemble importance measures is that they ensure the impact of each and every predictor variable distinctly including multivariate interactions with other predictor variables. But when correlation increases, both Gini and Permutation importance are incapable to detect relevant variables. Also, Gini importance is biased towards certain features. To reduce the feature selection bias for randomized trees, our proposed permuted Gini importance approach permutes the features thereby destroying their extrapolative influence without changing their marginal distribution. Permuting one of the features leads to model extrapolation. In tree ensembles the extrapolation quality is generally low, therefore, error value is high that forces Permuted Gini impurity to put weight on its importance. To reduce the feature selection bias for randomized trees, our proposed approach achieves higher computational efficiency. Jupyter notebook was employed as a primary platform to conduct the analysis and experimentation.