High-Dimensional Data Stream Classification: Improving Random Patch Online Ensemble Classifier

Authors

Keywords:

Data Stream Mining, Online machine learning, Incremental classification, High-dimensional stream, Streaming Random Patches, Compressed Sensing

Abstract

In recent years, the amount of data produced by human activities has increased massively, giving rise to a constant flow of data generated in real time, known as data streams. Data stream classification requires both incremental and adaptive learning approaches, mainly due to the challenges inherent in the data stream's rapidly changing patterns. The Streaming Random Patches (SRP), investigated in this work, offers a robust online ensemble model for evolving data stream classification. The latter uses incremental decision trees, Hoeffding Trees (HT), as base learners for online forecasts.  Each tree is incrementally trained on a unique random patch formed via global feature subspacing and online bagging to ensure ensemble variety. The OB brings bagging to streaming. It suggests instance weights for training frequency instead of sampling with replacement. A drift detection strategy replaces outdated base learners in each tree to keep ensemble relevance to recent data and prevent outdated predictions. The ensemble is incrementally built by testing the HTs before training, updating base learner weights based on testing predictions. Weighted majority voting determines ensemble predictions. Therefore, this study aims to retain good SRP performance when identifying high-dimensional streams. The unpredictable nature of data stream instances and their controllable dimensions can degrade the online classifier's prediction quality, availability, and execution time. To increase SRP classifier performance, we refine the compressed sensing (CS) technique before incremental stream processing to ensure efficient subspace selection.  Instead of the HT as the base classifier, we employ the Extremely Fast Decision Tree (EFDT) as a more statistically efficient base learner in the final model. The SRP and other techniques improved high-dimensional data stream prediction performance. Average accuracy gains were +0.15% to +5.43%. The suggested modifications reduced execution time by 95.69%, indicating the method's Green AI alignment.

 

Downloads

Download data is not yet available.

Author Biography

  • Sarah Nait Bahloul, University of Science and Technology of Oran,Mohamed Boudiaf

    Computer Science PhD holder, and lecturer at the computer science departement of the University of Science and Technology of Oran Mohamed
    Boudiaf, El Mnaouer 

Downloads

Published

28-12-2025

Issue

Section

Articles

How to Cite

Bensaoula, H. I., & Nait Bahloul, S. (2025). High-Dimensional Data Stream Classification: Improving Random Patch Online Ensemble Classifier. Journal of Soft Computing and Data Mining, 6(3), 154-169. https://penerbit.uthm.edu.my/ojs/index.php/jscdm/article/view/22939