Saputra, Ahmad Sofian Aris (2025) Pengembangan Model GloVe-LSTM Dan Algoritma Penilaian Jawaban Siswa Menggunakan Rouge Score, TF-IDF, dan Cosine Similarity. Undergraduate thesis, UPN Veteran Jawa Timur.
![]() |
Text
Cover.pdf Download (839kB) |
![]() |
Text
Bab 1.pdf Download (230kB) |
![]() |
Text
Bab 2.pdf Restricted to Repository staff only until 21 July 2027. Download (497kB) |
![]() |
Text
Bab 3.pdf Restricted to Repository staff only until 21 July 2027. Download (610kB) |
![]() |
Text
Bab 4.pdf Restricted to Repository staff only until 21 July 2027. Download (712kB) |
![]() |
Text
Bab 5.pdf Restricted to Repository staff only until 21 July 2027. Download (190kB) |
![]() |
Text
Daftar Pustaka.pdf Download (159kB) |
![]() |
Text
Lampiran.pdf Restricted to Repository staff only Download (774kB) |
Abstract
Manual assessment of student answers faces various challenges such as subjectivity, inconsistency, and prolonged grading time, particularly when dealing with the linguistic variations in student responses. This research aims to develop and evaluate an automated scoring model for Indonesian student answers that is more objective, accurate, and adaptive. The primary proposed model integrates GloVe word representations with Long Short-Term Memory (LSTM), supported by text assessment algorithms such as ROUGE Score, TF-IDF, and Cosine Similarity. The research methodology encompasses model architecture design, the collection of a proprietary dataset comprising 3420 student answer items (subsequently processed into 3152 samples for training and evaluation), data pre-processing, model training using TensorFlow and Keras, and testing across various scenarios. Furthermore, an exploratory study was conducted on a Comparative Model featuring a manual LSTM implementation using NumPy. Evaluation results on the test data indicate that the Main Model achieved a Mean Absolute Error (MAE) of 0.0761, a Pearson correlation coefficient of 0.8429, and a Quadratic Weighted Kappa (QWK) of 0.7332, suggesting its potential in ranking answers relatively consistently with manual assessments. Nevertheless, scenario analysis revealed that the Main Model tends to overestimate Scores for low-quality, irrelevant, or incorrect answers. It also demonstrated inconsistencies in handling variations in answer length, synonym usage, spelling errors, and responses lacking explicit keywords. The Comparative Model demonstrated basic learning capabilities but exhibited limited evaluation performance. This research concludes that the developed hybrid model shows potential but requires significant refinement in absolute Score accuracy and discriminatory power, particularly for low-quality and irrelevant answers within the Indonesian language context.
Item Type: | Thesis (Undergraduate) | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Contributors: |
|
||||||||||||
Subjects: | L Education > L Education (General) Q Science > Q Science (General) T Technology > TN Mining engineering. Metallurgy |
||||||||||||
Divisions: | Faculty of Computer Science > Departemen of Informatics | ||||||||||||
Depositing User: | Ahmad Sofian Aris Saputra | ||||||||||||
Date Deposited: | 22 Jul 2025 01:54 | ||||||||||||
Last Modified: | 22 Jul 2025 01:54 | ||||||||||||
URI: | https://repository.upnjatim.ac.id/id/eprint/40128 |
Actions (login required)
![]() |
View Item |