Implementasi Algoritma K-Means Dan K Nearest Neighbour (KNN) Untuk Identifikasi Penyakit Tuberculosis Pada Paru-Paru

IMAN, RACHMADHANY (2024) Implementasi Algoritma K-Means Dan K Nearest Neighbour (KNN) Untuk Identifikasi Penyakit Tuberculosis Pada Paru-Paru. Undergraduate thesis, UNIVERSITAS PEMBANGUNAN NASIONAL "VETERAN" JAWA TIMUR.

	Text (Cover) 19081010142-Cover.pdf Download (1MB)
	Text (Bab 1) 19081010142-BAB 1.pdf Download (95kB)
	Text (Bab 2) 19081010142-Bab 2.pdf Restricted to Repository staff only until 5 June 2028. Download (186kB)
	Text (Bab 3) 19081010142-Bab 3.pdf Restricted to Repository staff only until 5 June 2028. Download (1MB)
	Text (Bab 4) 19081010142-Bab 4.pdf Restricted to Repository staff only until 5 June 2028. Download (5MB)
	Text (Bab 5) 19081010142-Bab 5.pdf Download (632kB)
	Text (Daftar Pustaka) 19081010142-daftar pustaka.pdf Download (69kB)

Abstract

Health is a valuable asset. One of the vital organs in the human body that has a major influence on health is the lungs. In this study, the author will examine Tuberculosis disease. In Indonesia, Tuberculosis is ranked third in terms of prevalence among countries with the highest burden of Tuberculosis after India and China. Radiologic examinations, such as X-ray photographs or x-rays, are methods commonly used to detect TB. Chest X-ray examination is one of the methods used to detect tuberculosis. In this context, artificial intelligence and machine learning can provide assistance to doctors in identifying tuberculosis quickly and effectively. To achieve this goal, the research will combine two powerful data processing techniques. First, the K-Means algorithm will be used to cluster the x-ray image data based on similar characteristics, thereby facilitating the process of identifying the typical patterns of TB-infected images. The author aims to use the K-Means method for data segmentation, which will then be classified using KNN. The dataset used from the kaggle website is 1400 data with data distribution in the normal class containing 700 image data and tuberculosis 700 image data on a more balanced data division such as 80:20 or 70:30. Overall, the best K-Means clustering results, indicated by the highest Silhouette Score at 90:10 data splitting and K=1 are effective in handling the KNN model. This combined K-Means and KNN model has successfully classified with 4 data splitting scenarios with different parameter values K 1 to 10 getting good results on spliting data 80:20 and 70:30, while for the calculation of the K value of the 4 spliting data the calculation of the value K = 1 and K = 3 on spliting data 70:30 gets the best results than other K values.

Item Type:

Thesis (Undergraduate)

Contributors: