IMPLEMENTATION OF BALANCING DATA METHOD USING SMOTETOMEK IN DIABETES CLASSIFICATION USING XGBOOST

Kusumajati, Fatwa Ratantja and Rahmat, Basuki and Junaidi, Achmad (2024) IMPLEMENTATION OF BALANCING DATA METHOD USING SMOTETOMEK IN DIABETES CLASSIFICATION USING XGBOOST. Jurnal Ilmiah KURSOR, 12 (4). ISSN 0216 – 0544

[img] Text (Jurnal Fatwa Ratantja Kusumajati_20081010087)
jurnal fatwa published.pdf - Published Version

Download (1MB)

Abstract

In this research, XGBoost algorithm and the SMOTETomek approach are employed with the objective of enhancing the accuracy of diabetes classification. The study utilises 2,000 patient data points, comprising demographic and medical information, sourced from Kaggle. The dataset employed in this study comprises a number of variables, including pregnancies, glucose levels, blood pressure, skin thickness, insulin levels, Body Mass Index (BMI), diabetes pedigree function, age, and an outcome variable. The latter is a binary classification label, taking on the values 0 and 1. A value of 0 indicates that the patient is not affected by diabetes, whereas a value of 1 indicates that the patient has diabetes. Diabetes represents a significant public health concern in Indonesia. A significant challenge in this study was the imbalanced nature of the dataset, which included a disproportionate number of non-diabetic samples relative to diabetic samples. To address this class imbalance, the researchers employed the SMOTETomek method. SMOTETomek integrates the SMOTE (Synthetic Minority Over-sampling Technique) and Tomek links algorithms to oversample the minority class and remove borderline samples, thereby balancing the class distributions. The SMOTETomek method achieved higher accuracy (95.01%) than SMOTE and the original data (both 92.13%), highlighting the benefits of combining SMOTE with Tomek Links for XGBoost. During testing, SMOTETomek slightly reduced the minority class accuracy (0.97 vs. 0.99 for SMOTE and original data) but maintained strong F1-score and precision, indicating effective handling of data imbalance despite minor trade-offs.

Item Type: Article
Subjects: T Technology > T Technology (General)
Divisions: Faculty of Computer Science > Departemen of Informatics
Depositing User: Fatwa Ratantja Kusumajati
Date Deposited: 02 Jun 2025 01:51
Last Modified: 02 Jun 2025 01:51
URI: https://repository.upnjatim.ac.id/id/eprint/36823

Actions (login required)

View Item View Item