dc.contributor.author |
Alibuhtto, M. C. |
|
dc.date.accessioned |
2022-12-01T06:22:12Z |
|
dc.date.available |
2022-12-01T06:22:12Z |
|
dc.date.issued |
2022-11-15 |
|
dc.identifier.citation |
Proceedings of the 11th Annual Science Research Sessions, FAS, SEUSL, Sri Lanka 15th November 2022 Scientific Engagement for Sustainable Futuristic Innovations pp. 60. |
en_US |
dc.identifier.isbn |
978-624-5736-60-7 |
|
dc.identifier.isbn |
978-624-5736-59-1 |
|
dc.identifier.uri |
http://ir.lib.seu.ac.lk/handle/123456789/6314 |
|
dc.description.abstract |
In the current digital era, data is generated enormously with fast growth from
different sources, and managing such huge data is a big challenge. Clustering
algorithm is able to find hidden patterns and extract useful information from
huge datasets. Among the clustering techniques, k-means clustering algorithm
is the most commonly used unsupervised classification technique to determine
the optimal number of clusters (k). However, the choice of the optimal number
of clusters (k) is a prominent problem in the process of the k-means clustering
algorithm. In most cases, clustering huge data, k is pre-determined by
researcher, and incorrectly chosen k leads to increase computational cost. In
order to obtain the optimal number of clusters, a distance-based k-mean
algorithm was proposed with a simulated dataset. In the k-means algorithm,
two distance measures were considered namely Euclidean and Manhattan
distances. The results based on simulated data reveal that the k-means
algorithm with Euclidean distance yields the optimal number of clusters
compared to Manhattan distance. Testing on real datasets shows consistent
results as the simulated ones. |
en_US |
dc.language.iso |
en_US |
en_US |
dc.publisher |
Faculty of Applied Sciences, South Eastern University of Sri Lanka, Sammanthura |
en_US |
dc.subject |
Huge data |
en_US |
dc.subject |
Digital era |
en_US |
dc.subject |
Distance measure |
en_US |
dc.subject |
K-means algorithm |
en_US |
dc.title |
Determining the optimal number of clusters using distance based k-means algorithm |
en_US |
dc.type |
Article |
en_US |