Abstract:
Breast cancer is the most common type of cancer diagnosed in women
throughout the world. It can occur at any age in women’s lives, but the risk increased
with the age. In 2020 around 2.3millions of women are diagnosed with breast cancer
and among them, around 0.68 million died globally. There are two types of breast
cancer tumors: benign and malignant. Diagnosing breast cancer is kind of tough due
to the compound nature of the breast cancer cells. However, the treatments for breast
cancer are very effective when the disease is diagnosed at an early stage. In this study
seven machine learning algorithms are used: Logistic Regression (LR), Linear
Discriminant Analysis (LDA), K-Nearest Neighbor (KNN), Gaussian Naïve Bayes
(GN), Decision Tree Classifier (C4.5), Support Vector Classifier (SVC) and Random
Forest (RF) on Wisconsin Breast Cancer Dataset (WBCD) collected from UCI
repository for classifying the tumors into benign and malignant. This analysis is carried
out in two parts without removing the outliers from the dataset and after removing the
outliers from the dataset. Based on the analysis without removing the outliers SVC
outperforms other classifiers with 97.82% accuracy. After removing the outliers RF
gives the highest accuracy of 96.18%.