Klasifikasi Sertifikat Berdasarkan Mata Kuliah dalam Program Rekognisi Pembelajaran Lampau Berbasis NLP Menggunakan BERT

Saputra, Dimas (2025) Klasifikasi Sertifikat Berdasarkan Mata Kuliah dalam Program Rekognisi Pembelajaran Lampau Berbasis NLP Menggunakan BERT. Undergraduate thesis, UPN Veteran Jawa Timur.

[img] Text (Cover)
organized (11)_organized.pdf

Download (1MB)
[img] Text (Bab 1)
21081010151-bab1.pdf

Download (264kB)
[img] Text (Bab 2)
21081010151-bab2.pdf
Restricted to Repository staff only until 12 June 2027.

Download (536kB)
[img] Text (Bab 3)
21081010151-bab3.pdf
Restricted to Repository staff only until 12 June 2027.

Download (2MB)
[img] Text (Bab 4)
21081010151-bab4.pdf
Restricted to Repository staff only until 12 June 2027.

Download (911kB)
[img] Text (Bab 5)
21081010151-bab5.pdf

Download (262kB)
[img] Text (Daftar Pustaka)
21081010151-daftarpustaka.pdf

Download (204kB)
[img] Text (Lampiran)
21081010151-lampiran.pdf
Restricted to Repository staff only until 12 June 2027.

Download (1MB)

Abstract

The rapid advancement of information technology has increased the demand for automated systems to recognize non-formal learning, particularly through the Recognition of Prior Learning (RPL) program. Competency certificates obtained from independent training programs are often not systematically integrated into academic curricula. This study aims to develop an automated classification model capable of categorizing certificates into relevant course subjects within the Informatics study program. Seven target course categories were defined, including Machine Learning, Web Programming, Interface Design, Computer Networks, Game Applications, Mobile Programming, and Project Management. The methodology includes extracting text from certificate documents in PDF format through Optical Character Recognition (OCR) using PyTesseract, followed by text praproses and data Augmentation to improve class distribution. The implemented model is BERT (Bidirectional Encoder Representations from Transformers), evaluated in two configurations: bert-base-uncased and bert-base-multilingual uncased. Five data scenarios were tested: no Augmentation, Character Insertion, Character Deletion, Back Translation, and Synonym Replacement. Evaluation results indicate that the bert-base-uncased configuration with Synonym Replacement Augmentation yielded the best performance, achieving a validation accuracy of 95.54% and an F1-score of 0.97. These findings confirm the effectiveness of BERT for text classification in both Indonesian and English, and highlight the benefit of semantic-based Augmentation techniques in improving model generalization. As a practical implementation, this research also developed a prototype service using the Streamlit framework, enabling automated certificate classification through a user-friendly interface. The model and system developed are expected to support the efficient and accurate integration of non-formal competencies into academic programs.

Item Type: Thesis (Undergraduate)
Contributors:
ContributionContributorsNIDN/NIDKEmail
Thesis advisorDiyasa, I Gede Susrama MasNIDN0019067008gsusrama.if@upnjatim.ac.id
Thesis advisorPuspaningrum, Eva YuliaNIDN0005078908evapuspaningrum.if@upnjatim.ac.id
Subjects: T Technology > T Technology (General)
T Technology > T Technology (General) > T385 Computer Graphics
T Technology > T Technology (General) > T58.6-58.62 Management Information Systems
Divisions: Faculty of Computer Science > Departemen of Informatics
Depositing User: Dimas Saputra
Date Deposited: 12 Jun 2025 09:11
Last Modified: 12 Jun 2025 09:11
URI: https://repository.upnjatim.ac.id/id/eprint/37422

Actions (login required)

View Item View Item