Comparative analysis of machine learning classification approaches for heart disease prediction: a study of preprocessing techniques and algorithms

Karlavi, M. M.; Chatrabgoun, O.

Comparative analysis of machine learning classification approaches for heart disease prediction: a study of preprocessing techniques and algorithms

Karlavi, M. M.; Chatrabgoun, O.

URI: http://ir.lib.seu.ac.lk/handle/123456789/7580

Date: 2024-11-06

Abstract:

The growing number of cardiovascular diseases is a big concern for health worldwide, showing the need for effective tools to help detect these diseases early and provide timely treatment. This research contributes to the growing body of knowledge in healthcare analytics by investigating the effectiveness of machine learning algorithms to enhance patient diagnosis and treatment strategies in cardiovascular care. We compare four different Machin Learning Algorithms as K-Nearest Neighbors (K-NN), Logistic Regression (LR), Support Vector Machine (SVM), and Naive Bayes (NB) using a comprehensive dataset of patient attributes. Our methodology includes preprocessing techniques such as one-hot encoding and label encoding to convert categorical variables for optimal model performance. Additionally in our study, we explored the influence of these preprocessing techniques, identifying that one-hot encoding generally enhanced accuracy for most algorithms. Hyperparameter tuning was conducted for SVM, optimizing parameters as the kernel type and regularization strength, which further improved the model’s accuracy. The dataset was systematically split into 80% training and 20% testing subsets, allowing us to assess each algorithm's accuracy on the testing set. The results revealed that SVM outperformed the other algorithms, achieving an accuracy of 89.69%, highlighting the critical role of methodological choices in developing effective predictive models. What sets this research apart from recent studies is its comprehensive comparison of multiple algorithms alongside detailed data preprocessing techniques, providing insights into the impact of these choices on predictive performance.

Show full item record