Implementasi Algoritma XGBoost, CatBoost, dan LGBM Untuk Klasifikasi Pencemaran Udara di Kota Surabaya

Saputra, Gilang Enggar (2026) Implementasi Algoritma XGBoost, CatBoost, dan LGBM Untuk Klasifikasi Pencemaran Udara di Kota Surabaya. Undergraduate thesis, UPN Veteran Jawa Timur.

[img] Text (cover)
21081010237-cover.pdf

Download (659kB)
[img] Text (Bab 1)
21081010237-bab1.pdf

Download (256kB)
[img] Text (Bab 2)
21081010237-bab2.pdf
Restricted to Repository staff only until 20 January 2029.

Download (420kB)
[img] Text (Bab 3)
21081010237-bab3.pdf
Restricted to Repository staff only until 20 January 2029.

Download (862kB)
[img] Text (Bab 4)
21081010237-bab4.pdf
Restricted to Repository staff only until 20 January 2029.

Download (1MB)
[img] Text (Bab 5)
21081010237-bab5.pdf

Download (84kB)
[img] Text (Daftar Pustaka)
21081010237-Daftar Pustaka.pdf

Download (160kB)
[img] Text
21081010237-Lampiran.pdf
Restricted to Repository staff only

Download (164kB)

Abstract

This study aims to classify the level of air pollution in Surabaya City based on the Air Pollution Standard Index (ISPU) categories using three ensemble boosting- based machine learning algorithms: XGBoost, CatBoost, and LightGBM. The dataset consists of air quality parameters such as PM10, SO₂, CO, O₃, and NO₂, which were processed through data cleaning, encoding, normalization, and splitting into training and testing sets with ratios of 70:30, 75:25, and 80:20. The testing process was carried out using various combinations of learning rates (0.01 and 0.10) and iteration counts (100, 500, and 1000). Model performance was evaluated using the accuracy, precision, recall, and F1-score metrics. The results indicate that all three models successfully learned the data patterns, although their performance was influenced by class imbalance. The XGBoost model achieved the best performance by applying the ClassWeight method and minority class merging, resulting in an accuracy of 0.9594, precision of 0.8632, recall of 0.7787, and F1-score of 0.8098 with a 70:30 data split, 0.10 learning rate, and 500 iterations. The CatBoost and LightGBM models also performed well, obtaining the highest F1-scores of 0.7549 and 0.7263, respectively, after data balancing was applied. Overall, the combination of the ClassWeight technique and minority class merging proved effective in handling data imbalance and improving the model’s ability to recognize rare air pollution categories. The findings of this research are expected to serve as a foundation for developing a more accurate and adaptive air quality prediction system in the future.

Item Type: Thesis (Undergraduate)
Contributors:
ContributionContributorsNIDN/NIDKEmail
Thesis advisorSwari, Made Hanindia Prami198902052018032001madehanindia.fik@upnjatim.ac.id
Thesis advisorNurlaili, Afina Lina199312132022032010afina.lina.if@upnjatim.ac.id
Subjects: Q Science > Q Science (General)
Q Science > QB Astronomy
Divisions: Faculty of Computer Science > Departemen of Informatics
Depositing User: Gilang Enggar Saputra
Date Deposited: 20 Jan 2026 01:50
Last Modified: 20 Jan 2026 02:19
URI: https://repository.upnjatim.ac.id/id/eprint/48890

Actions (login required)

View Item View Item