APPLICATION OF ENSEMBLE MACHINE LEARNING FOR PHISHING SITE IDENTIFICATION BASED ON URL AND VISUAL ANALYSIS

Gabriel, Paskalis Reynaldy Elroy (2026) APPLICATION OF ENSEMBLE MACHINE LEARNING FOR PHISHING SITE IDENTIFICATION BASED ON URL AND VISUAL ANALYSIS. Undergraduate thesis, UPN Veteran Jawa Timur.

[img]
Preview
Text (Cover)
Cover (3).pdf

Download (764kB) | Preview
[img]
Preview
Text (Bab 1)
BAB 1.pdf

Download (141kB) | Preview
[img] Text (Bab 2)
BAB 2.pdf
Restricted to Repository staff only until 17 April 2029.

Download (697kB) | Request a copy
[img] Text (Bab 3)
BAB 3.pdf
Restricted to Repository staff only until 17 April 2029.

Download (721kB) | Request a copy
[img] Text (Bab 4)
BAB 4.pdf
Restricted to Repository staff only until 17 April 2029.

Download (3MB) | Request a copy
[img] Text (Bab 5)
BAB 5.pdf
Restricted to Repository staff only until 17 April 2029.

Download (100kB) | Request a copy
[img]
Preview
Text (Daftar Pustaka)
DAFTAR PUSTAKA.pdf

Download (160kB) | Preview

Abstract

Phishing attacks continue to evolve as a significant cybersecurity threat, with traditional blacklist-based and rule-based approaches proving insufficient in detecting newly emerging phishing sites. This study proposes an ensemble-based phishing detection system that integrates two analytical modalities: URL analysis using TF-IDF with character n-gram (3,6) and Complement Naïve Bayes (CNB), and visual web page analysis using VGG16 as a feature extractor and XGBoost as a classifier. Final classification decisions are produced through a late score fusion mechanism with weights of 0.8 for the URL pathway and 0.2 for the visual pathway. The system was evaluated using 8,707 URL samples and 824 website screenshot samples. Experimental results demonstrate that the hybrid ensemble system achieves an Accuracy of 94.90%, Precision of 0.9417, Recall of 0.9303, F1-Score of 0.9360, and ROC-AUC of 0.9801 — outperforming both the URL-only model (94.51%) and the visual-only model (86.29%). The 0.8/0.2 weight configuration was selected based on the visual pathway's ability to suppress false positives and provide an independent verification layer against URL obfuscation, consistent with the late fusion principle that minority modality contributions are most effective at weights ≥ 0.15. This study demonstrates that a multimodal ensemble approach significantly enhances the robustness and accuracy of phishing detection compared to unimodal approaches.

Item Type: Thesis (Undergraduate)
Contributors:
ContributionContributorsNIDN/NIDKEmail
Thesis advisorSari, Anggarini PuspitaNIDN0716088605anggraini.puspita.if@upnjatim.ac.id
Thesis advisorJunaidi, AchmadNIDN0710117803achmadjunaidi.if@upnjatim.ac.id
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76.6 Computer Programming
Depositing User: Paskalis Paskal Gabriel
Date Deposited: 22 May 2026 06:44
Last Modified: 22 May 2026 07:25
URI: https://repository.upnjatim.ac.id/id/eprint/52155

Actions (login required)

View Item View Item