Lutfi, Muhammad Rafli Aulia Rojani (2024) Analisis Penggunaan Teknik Oversampling Pada Extreme Gradient Boosting (XGBoost) Untuk Mengatasi Ketidakseimbangan Kelas Pada Klasifikasi Penyakit Jantung. Undergraduate thesis, UPN Veteran Jawa Timur.
Text (Cover)
20081010061-cover.pdf Download (2MB) |
|
Text (Bab 1)
20081010061-bab1.pdf Download (501kB) |
|
Text (Bab 2)
20081010061-bab2.pdf Restricted to Repository staff only until 20 September 2026. Download (918kB) |
|
Text (Bab 3)
20081010061-bab3.pdf Restricted to Repository staff only until 20 September 2026. Download (1MB) |
|
Text (Bab 4)
20081010061-bab4.pdf Restricted to Repository staff only until 20 September 2026. Download (3MB) |
|
Text (Bab 5)
20081010061-bab5.pdf Download (491kB) |
|
Text (Daftar Pustaka)
20081010061-daftarpustaka.pdf Download (235kB) |
|
Text (Lampiran)
20081010061-lampiran.pdf Restricted to Repository staff only until 20 September 2026. Download (315kB) |
Abstract
Heart disease is a serious threat to global health. Early detection is key to improving survival rates. However, building predictive models are often hampered by data imbalance, where the number of healthy individuals far exceeds the number of diseased individuals. This imbalance can cause the model to be biased towards the majority class (healthy) and neglect the minority class (diseased), reducing its accuracy in detecting heart disease. This research aims to address the problem of class imbalance by applying oversampling techniques to the XGBoost algorithm. The oversampling technique works by increasing the amount of data in the minority class so that data balance can be achieved. In this study, three oversampling techniques, namely ROS, SMOTE, and ADASYN, were tested. In addition, hyperparameter tuning of the XGBoost algorithm was performed to obtain optimal model performance. The results show that the XGBoost model without oversampling using default parameters has a high accuracy of above 0.90, but is not good at classifyingminority classes (sick individuals), as evidenced by the low g mean value of below 0.50. In contrast, the XGBoost model with oversampling techniques at optimal sampling ratios and hyperparameters is able to improve the performance of the model in classifying the minority class. Of the three oversampling techniques, ROS provides the highest average g-mean value of 0.80. However, the model is prone to overfitting at larger sampling ratios. In contrast to SMOTE and ADASYN, although both models are also susceptible to overfitting at sampling ratios above 0.1, the degree of susceptibility is much lower and tends to be more stable.
Item Type: | Thesis (Undergraduate) | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Contributors: |
|
||||||||||||
Subjects: | Q Science > Q Science (General) Q Science > QA Mathematics > QA75 Electronic computers. Computer science Q Science > QA Mathematics > QA76 Computer software T Technology > T Technology (General) |
||||||||||||
Divisions: | Faculty of Computer Science > Departemen of Informatics | ||||||||||||
Depositing User: | Muhammad Rafli Aulia Rojani lutfi | ||||||||||||
Date Deposited: | 20 Sep 2024 03:20 | ||||||||||||
Last Modified: | 20 Sep 2024 03:20 | ||||||||||||
URI: | https://repository.upnjatim.ac.id/id/eprint/29622 |
Actions (login required)
View Item |