Klasifikasi Sertifikat Berdasarkan Mata Kuliah dalam Program Rekognisi Pembelajaran Lampau Berbasis NLP Menggunakan BERT

Saputra, Dimas (2025) Klasifikasi Sertifikat Berdasarkan Mata Kuliah dalam Program Rekognisi Pembelajaran Lampau Berbasis NLP Menggunakan BERT. Undergraduate thesis, UPN Veteran Jawa Timur.

	Text (Cover) organized (11)_organized.pdf Download (1MB)
	Text (Bab 1) 21081010151-bab1.pdf Download (264kB)
	Text (Bab 2) 21081010151-bab2.pdf Restricted to Repository staff only until 12 June 2027. Download (536kB)
	Text (Bab 3) 21081010151-bab3.pdf Restricted to Repository staff only until 12 June 2027. Download (2MB)
	Text (Bab 4) 21081010151-bab4.pdf Restricted to Repository staff only until 12 June 2027. Download (911kB)
	Text (Bab 5) 21081010151-bab5.pdf Download (262kB)
	Text (Daftar Pustaka) 21081010151-daftarpustaka.pdf Download (204kB)
	Text (Lampiran) 21081010151-lampiran.pdf Restricted to Repository staff only until 12 June 2027. Download (1MB)

Abstract

The rapid advancement of information technology has increased the demand for automated systems to recognize non-formal learning, particularly through the Recognition of Prior Learning (RPL) program. Competency certificates obtained from independent training programs are often not systematically integrated into academic curricula. This study aims to develop an automated classification model capable of categorizing certificates into relevant course subjects within the Informatics study program. Seven target course categories were defined, including Machine Learning, Web Programming, Interface Design, Computer Networks, Game Applications, Mobile Programming, and Project Management. The methodology includes extracting text from certificate documents in PDF format through Optical Character Recognition (OCR) using PyTesseract, followed by text praproses and data Augmentation to improve class distribution. The implemented model is BERT (Bidirectional Encoder Representations from Transformers), evaluated in two configurations: bert-base-uncased and bert-base-multilingual uncased. Five data scenarios were tested: no Augmentation, Character Insertion, Character Deletion, Back Translation, and Synonym Replacement. Evaluation results indicate that the bert-base-uncased configuration with Synonym Replacement Augmentation yielded the best performance, achieving a validation accuracy of 95.54% and an F1-score of 0.97. These findings confirm the effectiveness of BERT for text classification in both Indonesian and English, and highlight the benefit of semantic-based Augmentation techniques in improving model generalization. As a practical implementation, this research also developed a prototype service using the Streamlit framework, enabling automated certificate classification through a user-friendly interface. The model and system developed are expected to support the efficient and accurate integration of non-formal competencies into academic programs.

Item Type:

Thesis (Undergraduate)

Contributors:

Contribution	Contributors	NIDN/NIDK	Email
Thesis advisor	Diyasa, I Gede Susrama Mas	NIDN0019067008	gsusrama.if@upnjatim.ac.id
Thesis advisor	Puspaningrum, Eva Yulia	NIDN0005078908	evapuspaningrum.if@upnjatim.ac.id

Subjects:

T Technology > T Technology (General)
T Technology > T Technology (General) > T385 Computer Graphics
T Technology > T Technology (General) > T58.6-58.62 Management Information Systems

Divisions:

Faculty of Computer Science > Departemen of Informatics

Depositing User:

Dimas Saputra

Date Deposited:

12 Jun 2025 09:11

Last Modified:

12 Jun 2025 09:11

URI:

https://repository.upnjatim.ac.id/id/eprint/37422