Leveraging Ensemble Learning Model to Improve Classification of Imbalanced Dataset
M. Khairul Anam (a*), Nurul Fadillah (a), Munawir (a), Rizalul Akram (a), Ade Zulkarnain Hasibuan (a), Irwanda Syahputra (a), alfa saleh (a), Cut Alna Fadhilla (a), Chichi Rizka Gunawan (a)

Department of Informatics, Faculty sains and Technology, Universitas Samudra,
Jl, Prof. Dr. Syarief Thayeb, Meurandeh, Langsa Lama, Langsa City, Aceh 24416
* khairulanam[at]unsam.ac.id


Abstract

Classification method often faces challenges related to imbalance labeled dataset and inconsistent accuracy by the use of single algorithms. Imbalance labeled data occurs when one class in the dataset is significantly smaller than others, leading to decreased model performance, particularly in predicting the minority class. The algorithm such as SVM frequently struggle to address imbalanced data effectively. Therefore, using techniques like SMOTE to balance data and boosting methods like XGBoost to improve accuracy is crucial. This study integrates SMOTE, SVM, and XGBoost, resulting in a combined model called SSVMXGB as we called it the ensemble model. This approach provides a more robust solution to classify the problems with imbalanced datasets by significantly improving performance through data balancing and model boosting. The results of the study show that on the village funding dataset, which contains 3078 records, SSVMXGB achieved an accuracy of 94%, SMOTE-SVM 97%, SVM-XGB 84%, and SVM 85%. On the online learning dataset with 1200 records, SSVMXGB achieved an accuracy of 98%, SMOTE-SVM 95%, SVM-XGB 90%, and SVM 81%. These results demonstrate that the SSVMXGB algorithm effectively handles imbalanced datasets, particularly by leveraging SMOTE to ensure better representation of minority classes, even in datasets with smaller record counts. Additionally, combining the strengths of SVM in non-linear classification and XGBoost in correcting previous prediction errors enhances the model^s overall performance

Keywords: Classification- SMOTE- SVM- XGBoost- Ensemble

Topic: Technology and Engineering

SIC 2024 Conference | Conference Management System