Analysing the performance of machine learning algorithms for effective classification of breast cancer

Achini Nisansala, M. M.

Analysing the performance of machine learning algorithms for effective classification of breast cancer

Achini Nisansala, M. M.

URI: http://ir.lib.seu.ac.lk/handle/123456789/6819

Date: 2023-05-03

Abstract:

Breast cancer is the most common type of cancer diagnosed in women throughout the world. It can occur at any age in women’s lives, but the risk increased with the age. In 2020 around 2.3millions of women are diagnosed with breast cancer and among them, around 0.68 million died globally. There are two types of breast cancer tumors: benign and malignant. Diagnosing breast cancer is kind of tough due to the compound nature of the breast cancer cells. However, the treatments for breast cancer are very effective when the disease is diagnosed at an early stage. In this study seven machine learning algorithms are used: Logistic Regression (LR), Linear Discriminant Analysis (LDA), K-Nearest Neighbor (KNN), Gaussian Naïve Bayes (GN), Decision Tree Classifier (C4.5), Support Vector Classifier (SVC) and Random Forest (RF) on Wisconsin Breast Cancer Dataset (WBCD) collected from UCI repository for classifying the tumors into benign and malignant. This analysis is carried out in two parts without removing the outliers from the dataset and after removing the outliers from the dataset. Based on the analysis without removing the outliers SVC outperforms other classifiers with 97.82% accuracy. After removing the outliers RF gives the highest accuracy of 96.18%.

Show full item record