Implementasi Algoritma XGBoost, CatBoost, dan LGBM Untuk Klasifikasi Pencemaran Udara di Kota Surabaya

Saputra, Gilang Enggar (2026) Implementasi Algoritma XGBoost, CatBoost, dan LGBM Untuk Klasifikasi Pencemaran Udara di Kota Surabaya. Undergraduate thesis, UPN Veteran Jawa Timur.

	Text (cover) 21081010237-cover.pdf Download (659kB)
	Text (Bab 1) 21081010237-bab1.pdf Download (256kB)
	Text (Bab 2) 21081010237-bab2.pdf Restricted to Repository staff only until 20 January 2029. Download (420kB)
	Text (Bab 3) 21081010237-bab3.pdf Restricted to Repository staff only until 20 January 2029. Download (862kB)
	Text (Bab 4) 21081010237-bab4.pdf Restricted to Repository staff only until 20 January 2029. Download (1MB)
	Text (Bab 5) 21081010237-bab5.pdf Download (84kB)
	Text (Daftar Pustaka) 21081010237-Daftar Pustaka.pdf Download (160kB)
	Text 21081010237-Lampiran.pdf Restricted to Repository staff only Download (164kB)

Abstract

This study aims to classify the level of air pollution in Surabaya City based on the Air Pollution Standard Index (ISPU) categories using three ensemble boosting- based machine learning algorithms: XGBoost, CatBoost, and LightGBM. The dataset consists of air quality parameters such as PM10, SO₂, CO, O₃, and NO₂, which were processed through data cleaning, encoding, normalization, and splitting into training and testing sets with ratios of 70:30, 75:25, and 80:20. The testing process was carried out using various combinations of learning rates (0.01 and 0.10) and iteration counts (100, 500, and 1000). Model performance was evaluated using the accuracy, precision, recall, and F1-score metrics. The results indicate that all three models successfully learned the data patterns, although their performance was influenced by class imbalance. The XGBoost model achieved the best performance by applying the ClassWeight method and minority class merging, resulting in an accuracy of 0.9594, precision of 0.8632, recall of 0.7787, and F1-score of 0.8098 with a 70:30 data split, 0.10 learning rate, and 500 iterations. The CatBoost and LightGBM models also performed well, obtaining the highest F1-scores of 0.7549 and 0.7263, respectively, after data balancing was applied. Overall, the combination of the ClassWeight technique and minority class merging proved effective in handling data imbalance and improving the model’s ability to recognize rare air pollution categories. The findings of this research are expected to serve as a foundation for developing a more accurate and adaptive air quality prediction system in the future.

Item Type:

Thesis (Undergraduate)

Contributors:

Contribution	Contributors	NIDN/NIDK	Email
Thesis advisor	Swari, Made Hanindia Prami	198902052018032001	madehanindia.fik@upnjatim.ac.id
Thesis advisor	Nurlaili, Afina Lina	199312132022032010	afina.lina.if@upnjatim.ac.id

Subjects:

Q Science > Q Science (General)
Q Science > QB Astronomy

Divisions:

Faculty of Computer Science > Departemen of Informatics

Depositing User:

Gilang Enggar Saputra

Date Deposited:

20 Jan 2026 01:50

Last Modified:

20 Jan 2026 02:19

URI:

https://repository.upnjatim.ac.id/id/eprint/48890