Klasifikasi Multilabel Informasi Tweet Bencana Alam Menggunakan Categorical Boosting dan Optimasi Bayesian

Christina, Enzelica Vica (2025) Klasifikasi Multilabel Informasi Tweet Bencana Alam Menggunakan Categorical Boosting dan Optimasi Bayesian. Undergraduate thesis, UPN Veteran Jawa Timur.

[img] Text (Cover)
21083010114-cover.pdf

Download (2MB)
[img] Text (Bab 1)
21083010114-bab1.pdf

Download (282kB)
[img] Text (Bab 2)
21083010114-bab2.pdf
Restricted to Repository staff only until 18 September 2027.

Download (674kB)
[img] Text (Bab 3)
21083010114-bab3.pdf
Restricted to Repository staff only until 18 September 2027.

Download (666kB)
[img] Text (Bab 4)
21083010114-bab4.pdf
Restricted to Repository staff only until 18 September 2027.

Download (1MB)
[img] Text (Bab 5)
21083010114-bab5.pdf

Download (260kB)
[img] Text (Daftar Pustaka)
21083010114-daftarpustaka.pdf

Download (213kB)
[img] Text (Lampiran)
21083010114-lampiran.pdf
Restricted to Repository staff only

Download (479kB)

Abstract

Text classification is an important technique in natural language processing that enables automatic grouping of text data. Twitter is a potential source of data for disaster analysis because it is real time and widely used by Indonesians. The information contained therein often covers more than one aspect, so a multilabel classification approach is more appropriate than a single classification. However, the use of the Categorical Boosting (CatBoost) algorithm for multilabel classification of disaster Twitter data is still rare. This study applies CatBoost with the advantage of Ordered Boosting to reduce overfitting and handle noisy and high dimensional data. Model performance is improved through Bayesian optimization, which is capable of efficiently exploring the parameter space under conditions of an imbalanced label distribution. The model was developed to predict six information labels, namely disaster, location, damage, victims, assistance, and others, with evaluation using hamming loss, weighted f1-score, and subset accuracy. Through this approach, the study provides an academic contribution in the form of a new reference for the application of CatBoost with Bayesian optimization for multilabel classification of disaster data. In addition, practical contributions are realized through a Streamlit-based system that is capable of presenting accurate and efficient disaster information for emergency response needs in Indonesia. The evaluation results show that the 90:10 data split scenario produces the best performance with a hamming loss of 0.0371, a weighted f1-score of 95.21%, and an accuracy of 82.45%.

Item Type: Thesis (Undergraduate)
Contributors:
ContributionContributorsNIDN/NIDKEmail
Thesis advisorSaputra, Wahyu Syaifullah JauharisNIDN0725088601wahyu.s.j.saputra.if@upnjatim.ac.id
Thesis advisorHindrayani, Kartika MaulidaNIDN0009099205kartika.maulida.ds@upnjatim.ac.id
Subjects: Q Science > QA Mathematics > QA76.6 Computer Programming
Divisions: Faculty of Computer Science > Departemen of Data Science
Depositing User: Enzelica Vica Christina
Date Deposited: 19 Sep 2025 03:26
Last Modified: 19 Sep 2025 03:26
URI: https://repository.upnjatim.ac.id/id/eprint/43793

Actions (login required)

View Item View Item