Christina, Enzelica Vica (2025) Klasifikasi Multilabel Informasi Tweet Bencana Alam Menggunakan Categorical Boosting dan Optimasi Bayesian. Undergraduate thesis, UPN Veteran Jawa Timur.
![]() |
Text (Cover)
21083010114-cover.pdf Download (2MB) |
![]() |
Text (Bab 1)
21083010114-bab1.pdf Download (282kB) |
![]() |
Text (Bab 2)
21083010114-bab2.pdf Restricted to Repository staff only until 18 September 2027. Download (674kB) |
![]() |
Text (Bab 3)
21083010114-bab3.pdf Restricted to Repository staff only until 18 September 2027. Download (666kB) |
![]() |
Text (Bab 4)
21083010114-bab4.pdf Restricted to Repository staff only until 18 September 2027. Download (1MB) |
![]() |
Text (Bab 5)
21083010114-bab5.pdf Download (260kB) |
![]() |
Text (Daftar Pustaka)
21083010114-daftarpustaka.pdf Download (213kB) |
![]() |
Text (Lampiran)
21083010114-lampiran.pdf Restricted to Repository staff only Download (479kB) |
Abstract
Text classification is an important technique in natural language processing that enables automatic grouping of text data. Twitter is a potential source of data for disaster analysis because it is real time and widely used by Indonesians. The information contained therein often covers more than one aspect, so a multilabel classification approach is more appropriate than a single classification. However, the use of the Categorical Boosting (CatBoost) algorithm for multilabel classification of disaster Twitter data is still rare. This study applies CatBoost with the advantage of Ordered Boosting to reduce overfitting and handle noisy and high dimensional data. Model performance is improved through Bayesian optimization, which is capable of efficiently exploring the parameter space under conditions of an imbalanced label distribution. The model was developed to predict six information labels, namely disaster, location, damage, victims, assistance, and others, with evaluation using hamming loss, weighted f1-score, and subset accuracy. Through this approach, the study provides an academic contribution in the form of a new reference for the application of CatBoost with Bayesian optimization for multilabel classification of disaster data. In addition, practical contributions are realized through a Streamlit-based system that is capable of presenting accurate and efficient disaster information for emergency response needs in Indonesia. The evaluation results show that the 90:10 data split scenario produces the best performance with a hamming loss of 0.0371, a weighted f1-score of 95.21%, and an accuracy of 82.45%.
Item Type: | Thesis (Undergraduate) | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Contributors: |
|
||||||||||||
Subjects: | Q Science > QA Mathematics > QA76.6 Computer Programming | ||||||||||||
Divisions: | Faculty of Computer Science > Departemen of Data Science | ||||||||||||
Depositing User: | Enzelica Vica Christina | ||||||||||||
Date Deposited: | 19 Sep 2025 03:26 | ||||||||||||
Last Modified: | 19 Sep 2025 03:26 | ||||||||||||
URI: | https://repository.upnjatim.ac.id/id/eprint/43793 |
Actions (login required)
![]() |
View Item |