Roiqoh, Aprinia Salsabila (2026) Predicting Movie Popularity in Indonesia Based On Metadata Using Gradient Boosting. Undergraduate thesis, UPN Veteran Jawa Timur.
|
Text (Cover)
22081010166.-cover.pdf Download (3MB) | Preview |
|
|
Text (BAB 1)
22081010166.-bab1.pdf Download (1MB) | Preview |
|
|
Text (BAB 2)
22081010166.-bab2.pdf Restricted to Repository staff only until 26 May 2028. Download (4MB) |
||
|
Text (BAB 3)
22081010166.-bab3.pdf Restricted to Repository staff only until 26 May 2028. Download (4MB) |
||
|
Text (BAB 4)
22081010166.-bab4.pdf Restricted to Repository staff only until 26 May 2028. Download (7MB) |
||
|
Text (BAB 5)
22081010166.-bab5.pdf Download (431kB) | Preview |
|
|
Text (Daftar Pustaka)
22081010166.-daftarpustaka.pdf Download (923kB) | Preview |
|
|
Text (Lampiran)
22081010166.-lampiran.pdf Restricted to Repository staff only Download (825kB) |
Abstract
The film industry in Indonesia has experienced significant growth; however, the success of a film in attracting audiences remains difficult to predict accurately. This study aims to develop a model for predicting the number of moviegoers in Indonesia based on pre-release metadata using gradient boosting algorithms, namely XGBoost, LightGBM, and CatBoost. The dataset was collected from Cinepoint and TMDb, consisting of 3,464 initial records, which were reduced to 2,595 after the preprocessing stage. The preprocessing steps included data cleaning, selective handling of missing values, logarithmic transformation of the target variable, and feature engineering using a Bayesian smoothing approach. The models were trained using two data split scenarios (80:20 and 70:30), and hyperparameter optimization was performed using Random Search and Bayesian Optimization (Optuna). Model performance was evaluated using RMSE, MAE, MAPE, and R² metrics. The results show that the best model was achieved by CatBoost with Random Search under the 80:20 data split scenario, yielding an R² value of 0.8729, MAE of 0.5538, RMSE of 0.7698, and MAPE 5,02%. These results indicate that CatBoost provides the most accurate and stable prediction performance compared to XGBoost and LightGBM. Furthermore, hyperparameter tuning was proven to improve model performance in predicting movie audience numbers. Feature importance and SHAP analysis reveal that the main actors, directors, and genres are the most influential features in the prediction results. This indicates that pre-release metadata plays a significant role in determining movie popularity in Indonesia.
| Item Type: | Thesis (Undergraduate) | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Contributors: |
|
||||||||||||
| Subjects: | Q Science > QA Mathematics > QA76.87 Neural computers | ||||||||||||
| Divisions: | Faculty of Computer Science > Departemen of Informatics | ||||||||||||
| Depositing User: | Aprinia Salsabila Roiqoh | ||||||||||||
| Date Deposited: | 26 May 2026 01:40 | ||||||||||||
| Last Modified: | 26 May 2026 01:40 | ||||||||||||
| URI: | https://repository.upnjatim.ac.id/id/eprint/52617 |
Actions (login required)
![]() |
View Item |
