Abstract:
Due to the increasing prevalence of diabetes and cancer, it is an urgent need to develop automated system that helps to detect disease using one of the modern technologies. Nowadays, Machine Learning (ML)-based methods have become very popular as an automatically model building techniques. Despite of the rapid development of theories for computational intelligence, application of ML-based classifiers to diabetes and cancer diagnosis remains a challenging issue. Still these ML-based classifiers did not give a satisfactory accuracy and therfore cannot correctly classify healthcare data like diabetes and cancer patients. Because most of the diabetes and cancer dataset are complex in nature and contains missing values, unusual observations, multi-collinearity problems and so on. In most of the existing research, the researcher did not use feature selection (FS) techniques to identify the risk factors of cancer and diabetes disease. They applied limited classifiers to classify and predict the diabetes and cancer status but they did not tune the hyper parameter of the classifiers, as a result, their accuracy and AUC were low. Thus, an attempt has been made in this study to increase the accuracy of the classifiers in diabetes and cancer data by considering the above factors in ML-based algorithm. The main objective of this study is to comparison the performances of ML-based methods in healthcare data and suggests the best model with better performance compared to the models published in the existing research.-----
Description:
This Thesis is Submitted to the Department of Statistics , University of Rajshahi, Rajshahi, Bangladesh for The Degree of Master of Philosophy (MPhil)