APPLICATION OF ENSEMBLE MACHINE LEARNING FOR PHISHING SITE IDENTIFICATION BASED ON URL AND VISUAL ANALYSIS

Gabriel, Paskalis Reynaldy Elroy (2026) APPLICATION OF ENSEMBLE MACHINE LEARNING FOR PHISHING SITE IDENTIFICATION BASED ON URL AND VISUAL ANALYSIS. Undergraduate thesis, UPN Veteran Jawa Timur.

Preview

Text (Cover)
Cover (3).pdf
Download (764kB) | Preview

Preview

Text (Bab 1)
BAB 1.pdf
Download (141kB) | Preview

Text (Bab 2)
BAB 2.pdf
Restricted to Repository staff only until 17 April 2029.
Download (697kB) | Request a copy

Text (Bab 3)
BAB 3.pdf
Restricted to Repository staff only until 17 April 2029.
Download (721kB) | Request a copy

Text (Bab 4)
BAB 4.pdf
Restricted to Repository staff only until 17 April 2029.
Download (3MB) | Request a copy

Text (Bab 5)
BAB 5.pdf
Restricted to Repository staff only until 17 April 2029.
Download (100kB) | Request a copy

Preview

Text (Daftar Pustaka)
DAFTAR PUSTAKA.pdf
Download (160kB) | Preview

Abstract

Phishing attacks continue to evolve as a significant cybersecurity threat, with traditional blacklist-based and rule-based approaches proving insufficient in detecting newly emerging phishing sites. This study proposes an ensemble-based phishing detection system that integrates two analytical modalities: URL analysis using TF-IDF with character n-gram (3,6) and Complement Naïve Bayes (CNB), and visual web page analysis using VGG16 as a feature extractor and XGBoost as a classifier. Final classification decisions are produced through a late score fusion mechanism with weights of 0.8 for the URL pathway and 0.2 for the visual pathway. The system was evaluated using 8,707 URL samples and 824 website screenshot samples. Experimental results demonstrate that the hybrid ensemble system achieves an Accuracy of 94.90%, Precision of 0.9417, Recall of 0.9303, F1-Score of 0.9360, and ROC-AUC of 0.9801 — outperforming both the URL-only model (94.51%) and the visual-only model (86.29%). The 0.8/0.2 weight configuration was selected based on the visual pathway's ability to suppress false positives and provide an independent verification layer against URL obfuscation, consistent with the late fusion principle that minority modality contributions are most effective at weights ≥ 0.15. This study demonstrates that a multimodal ensemble approach significantly enhances the robustness and accuracy of phishing detection compared to unimodal approaches.

Item Type:

Thesis (Undergraduate)

Contributors:

Contribution	Contributors	NIDN/NIDK	Email
Thesis advisor	Sari, Anggarini Puspita	NIDN0716088605	anggraini.puspita.if@upnjatim.ac.id
Thesis advisor	Junaidi, Achmad	NIDN0710117803	achmadjunaidi.if@upnjatim.ac.id

Subjects:

Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76.6 Computer Programming

Depositing User:

Paskalis Paskal Gabriel

Date Deposited:

22 May 2026 06:44

Last Modified:

22 May 2026 07:25

URI:

https://repository.upnjatim.ac.id/id/eprint/52155

Actions (login required)

View Item