Penerapan Sentence-BERT dan Cosine Similarity untuk Pencarian Semantik Dokumen Skripsi dalam Format PDF

Hafizh Fathuddin, Muhammad Abdul (2025) Penerapan Sentence-BERT dan Cosine Similarity untuk Pencarian Semantik Dokumen Skripsi dalam Format PDF. Undergraduate thesis, UPN Veteran Jawa Timur.

[img] Text (COVER)
21081010225_Muhammad Abdul Hafizh fathuddin_Cover.pdf

Download (1MB)
[img] Text (BAB 1)
21081010225_Muhammad Abdul Hafizh fathuddin_Bab_1.pdf

Download (135kB)
[img] Text (BAB 2)
21081010225_Muhammad Abdul Hafizh fathuddin_Bab_2.pdf
Restricted to Repository staff only until 15 September 2027.

Download (908kB)
[img] Text (BAB 3)
21081010225_Muhammad Abdul Hafizh fathuddin_Bab_3.pdf
Restricted to Repository staff only until 15 September 2027.

Download (325kB)
[img] Text (BAB 4)
21081010225_Muhammad Abdul Hafizh fathuddin_Bab_4.pdf
Restricted to Repository staff only until 15 September 2027.

Download (1MB)
[img] Text (BAB 5)
21081010225_Muhammad Abdul Hafizh fathuddin_Bab_5.pdf

Download (126kB)
[img] Text (DAFTAR PUSTAKA)
21081010225_Muhammad Abdul Hafizh fathuddin_Daftar Pustaka.pdf

Download (168kB)
[img] Text (LAMPIRAN)
21081010225_Muhammad Abdul Hafizh fathuddin_Lampiran.pdf
Restricted to Repository staff only

Download (5MB)

Abstract

The search for thesis documents in digital repositories is generally limited to keyword matching, which often produces less relevant results. To address this issue, this study develops a semantic search system for thesis documents in PDF format by utilizing Sentence-BERT (SBERT) and the Cosine Similarity method, combined with ontology to enrich the understanding of query meanings. The research stages include text extraction from PDF documents, preprocessing, WordPiece tokenization, and sentence vector representation using SBERT, with relevance scores calculated by combining cosine similarity (0.7) and ontology (0.3) weights. The evaluation results show that the system is capable of producing relevant search results with a consistent Mean Reciprocal Rank (MRR) of 1.0 across all query types. The average Precision reached 0.80, while the average Recall was 0.92. A comparison with the Keyword Matching method shows that the semantic approach performs better, with an average Precision of 0.88 and Recall of 0.65, compared to keyword matching which only achieved 0.24 for Precision and 0.12 for Recall. These findings demonstrate that the semantic system effectively places the most relevant documents at the top rank and outperforms keyword-based search, although the coverage of relevant results still needs to be improved through ontology enrichment and dataset expansion. Keywords: Semantic Search, Sentence-BERT, Cosine Similarity, Ontology, Thesis Documents.

Item Type: Thesis (Undergraduate)
Contributors:
ContributionContributorsNIDN/NIDKEmail
Thesis advisorMandyartha, Eka PrakarsaNIDN0725058805eka_prakarsa.fik@upnjatim.ac.id
Thesis advisorNurlaili, Afina LinaNIDN0013129303afina.lina.if@upnjatim.ac.id
Subjects: T Technology > T Technology (General) > T58.6-58.62 Management Information Systems
Divisions: Faculty of Computer Science > Departemen of Informatics
Depositing User: Muhammad Abdul Hafizh Fathuddin
Date Deposited: 15 Sep 2025 07:02
Last Modified: 15 Sep 2025 07:02
URI: https://repository.upnjatim.ac.id/id/eprint/43200

Actions (login required)

View Item View Item