Hafizh Fathuddin, Muhammad Abdul (2025) Penerapan Sentence-BERT dan Cosine Similarity untuk Pencarian Semantik Dokumen Skripsi dalam Format PDF. Undergraduate thesis, UPN Veteran Jawa Timur.
![]() |
Text (COVER)
21081010225_Muhammad Abdul Hafizh fathuddin_Cover.pdf Download (1MB) |
![]() |
Text (BAB 1)
21081010225_Muhammad Abdul Hafizh fathuddin_Bab_1.pdf Download (135kB) |
![]() |
Text (BAB 2)
21081010225_Muhammad Abdul Hafizh fathuddin_Bab_2.pdf Restricted to Repository staff only until 15 September 2027. Download (908kB) |
![]() |
Text (BAB 3)
21081010225_Muhammad Abdul Hafizh fathuddin_Bab_3.pdf Restricted to Repository staff only until 15 September 2027. Download (325kB) |
![]() |
Text (BAB 4)
21081010225_Muhammad Abdul Hafizh fathuddin_Bab_4.pdf Restricted to Repository staff only until 15 September 2027. Download (1MB) |
![]() |
Text (BAB 5)
21081010225_Muhammad Abdul Hafizh fathuddin_Bab_5.pdf Download (126kB) |
![]() |
Text (DAFTAR PUSTAKA)
21081010225_Muhammad Abdul Hafizh fathuddin_Daftar Pustaka.pdf Download (168kB) |
![]() |
Text (LAMPIRAN)
21081010225_Muhammad Abdul Hafizh fathuddin_Lampiran.pdf Restricted to Repository staff only Download (5MB) |
Abstract
The search for thesis documents in digital repositories is generally limited to keyword matching, which often produces less relevant results. To address this issue, this study develops a semantic search system for thesis documents in PDF format by utilizing Sentence-BERT (SBERT) and the Cosine Similarity method, combined with ontology to enrich the understanding of query meanings. The research stages include text extraction from PDF documents, preprocessing, WordPiece tokenization, and sentence vector representation using SBERT, with relevance scores calculated by combining cosine similarity (0.7) and ontology (0.3) weights. The evaluation results show that the system is capable of producing relevant search results with a consistent Mean Reciprocal Rank (MRR) of 1.0 across all query types. The average Precision reached 0.80, while the average Recall was 0.92. A comparison with the Keyword Matching method shows that the semantic approach performs better, with an average Precision of 0.88 and Recall of 0.65, compared to keyword matching which only achieved 0.24 for Precision and 0.12 for Recall. These findings demonstrate that the semantic system effectively places the most relevant documents at the top rank and outperforms keyword-based search, although the coverage of relevant results still needs to be improved through ontology enrichment and dataset expansion. Keywords: Semantic Search, Sentence-BERT, Cosine Similarity, Ontology, Thesis Documents.
Item Type: | Thesis (Undergraduate) | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Contributors: |
|
||||||||||||
Subjects: | T Technology > T Technology (General) > T58.6-58.62 Management Information Systems | ||||||||||||
Divisions: | Faculty of Computer Science > Departemen of Informatics | ||||||||||||
Depositing User: | Muhammad Abdul Hafizh Fathuddin | ||||||||||||
Date Deposited: | 15 Sep 2025 07:02 | ||||||||||||
Last Modified: | 15 Sep 2025 07:02 | ||||||||||||
URI: | https://repository.upnjatim.ac.id/id/eprint/43200 |
Actions (login required)
![]() |
View Item |