Determining the optimal number of clusters using distance based k-means algorithm

Alibuhtto, M. C.

SEUIR Home
→
Research Publications
→
Proceedings
→
SEUSL Conferences
→
Annual Science Research Session (ASRS)
→
11th Annual Science Research Session - FAS
→
View Item

Determining the optimal number of clusters using distance based k-means algorithm

Alibuhtto, M. C.

URI: http://ir.lib.seu.ac.lk/handle/123456789/6314

Date: 2022-11-15

Abstract:

In the current digital era, data is generated enormously with fast growth from different sources, and managing such huge data is a big challenge. Clustering algorithm is able to find hidden patterns and extract useful information from huge datasets. Among the clustering techniques, k-means clustering algorithm is the most commonly used unsupervised classification technique to determine the optimal number of clusters (k). However, the choice of the optimal number of clusters (k) is a prominent problem in the process of the k-means clustering algorithm. In most cases, clustering huge data, k is pre-determined by researcher, and incorrectly chosen k leads to increase computational cost. In order to obtain the optimal number of clusters, a distance-based k-mean algorithm was proposed with a simulated dataset. In the k-means algorithm, two distance measures were considered namely Euclidean and Manhattan distances. The results based on simulated data reveal that the k-means algorithm with Euclidean distance yields the optimal number of clusters compared to Manhattan distance. Testing on real datasets shows consistent results as the simulated ones.

Show full item record