Performance Evaluation of Machine Learning Algorithms on Letter Recognition Task
Keywords:
Letter Recognition, k-Nearest Neighbours, Random Forest, EMNIST Dataset, Performance EvaluationAbstract
Machine Learning (ML) algorithms have become integral in addressing various technological challenges, including image recognition and text classification. A critical application is handwritten letter recognition, which demands efficient algorithms to handle diverse handwriting styles and complex data. This study applies and evaluates two supervised learning algorithms, k-Nearest Neighbours (kNN) and Random Forest, on the Extended Modified National Institute of Standards and Technology (EMNIST) Letters dataset, comprising 124,800 training samples and 20,800 testing samples of uppercase and lowercase letters. Our goal is to determine the optimal hyperparameters for both algorithms, namely k for kNN, and the number of trees and tree depth for Random Forest. For kNN, the optimal number of neighbours is , achieving an accuracy of . For Random Forest, the optimal hyperparameters include a tree depth of , a minimum sample split of , and trees, yielding an accuracy of . Our results show that by selecting the optimal hyperparameters for both algorithms, kNN outperforms Random Forest in terms of accuracy.



