• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology

Article

Indian Journal of Science and Technology

Year: 2022, Volume: 15, Issue: 35, Pages: 1712-1721

Original Article

An Effective Initialization Method Based on Quartiles for the K-means Algorithm

Received Date:08 April 2022, Accepted Date:30 July 2022, Published Date:07 September 2022

Abstract

Objectives: This study aims to speed up the K-means algorithm by offering a deterministic quartile-based seeding strategy for initializing preliminary cluster centers for the K-means algorithm, enabling it to efficiently build high-quality clusters. Methods: We have investigated various cluster center initialization approaches in literature and presented our findings. For the Kmeans algorithm, we here propose a novel deterministic technique based on quartiles for finding initial cluster centers. To obtain the preliminary cluster centers, we have applied our suggested approach to the data set. The initial cluster centers determined by our suggested method are then entered into the K-means algorithm. The proposed seeding method is evaluated on sixteen benchmark clustering data sets: five synthetic and eleven real data sets. Python is used to run the simulation. Findings: Based on empirical results from experiments, it is evident that our proposed cluster center initialization method allows the K-means algorithm to form clusters with SSE values comparable to the minimum SSE values produced by repeated Random or Kmeans++ initializations. Furthermore, our deterministic initialization strategy assures that the K-means algorithm converges faster than the Random and K-means++ initialization techniques. Novelty: In this study, we explore the potential of quartile-based seeding as a technique of accelerating the Kmeans algorithm. Needless to add, as our seeding method is deterministic, the requirement to run K-means repeatedly with different stochastic initializations is completely eliminated. Also, our initialization strategy assures that there is remarkable saving in execution time as compared to the Random and Kmeans++ initialization techniques. Moreover, it is found that after initializing with our offered method, the solution obtained with just a single run of K-means produces optimal clusters. Applications: Our proposed seeding technique will be helpful for initializing the K-means algorithm in time-sensitive applications, applications managing large amounts of data, and applications looking for deterministic cluster solutions.

Keywords: Kmeans Algorithm; Initialization Method; Speeding Kmeans; Quartiles; Clustering; Deterministic Initialization Method

References

  1. Gouho JB, Karim S, Aka B, Babri M. Automatic Modulation Classification Based on In-Phase Quadrature Diagram Constellation Combined with a Deep Learning Model. Indian Journal of Science and Technology. 2020;13(2):200–212. Available from: https://doi.org/10.17485/ijst/2020/v13i02/148648
  2. Fränti P, Sieranoja S. K-means properties on six clustering benchmark datasets. Applied Intelligence. 2018;48(12):4743–4759. Available from: https://doi.org/10.1007/s10489-018-1238-7
  3. Seman A, Sapawi AM. An Optimal and Stable Algorithm for Clustering Numerical Data. Algorithms. 0197;14(7):197. Available from: https://doi.org/10.3390/a14070197
  4. Cortez P, Cerdeira A, Almeida F, Matos T, Reis J. Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems. 2009;47(4):547–553. Available from: https://doi.org/10.1016/j.dss.2009.05.016
  5. Vouros A, Langdell S, Croucher M, Vasilaki E. An empirical comparison between stochastic and deterministic centroid initialisation for K-means variations. Machine Learning. 2021;110(8):1975–2003. Available from: https://doi.org/10.1007/s10994-021-06021-7
  6. Fahim A. K and starting means for k-means algorithm. Journal of Computational Science. 2021;55:101445. Available from: https://doi.org/10.1016/j.jocs.2021.101445
  7. Fahim A. Finding the Number of Clusters in Data and Better Initial Centers for K-means Algorithm. International Journal of Intelligent Systems and Applications. 2020;12(6):1–20. Available from: https://doi.org/10.5815/ijisa.2020.06.01
  8. Jie Y, Yu-Kai W, Xin Y. Lin Chin-Teng. Adaptive Initialization Method for K-Means Algorithm. Frontiers in Artificial Intelligence. 4. Available from: https://www.frontiersin.org/article/10.3389/frai.2021.740817
  9. Ciuparu A, Mureșan RC. Gradient-k: Improving the Performance of K-Means Using the Density Gradient. bioRxiv. 2022;03(30):486343. Available from: https://doi.org/10.1101/2022.03.30.486343
  10. Celebi ME, Kingravi HA, Vela PA. A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Systems with Applications. 2013;40(1):200–210. Available from: https://arxiv.org/pdf/1209.1960.pdf?ref=https://githubhelp.com
  11. Jambudi T, Gandhi S. Analysing the effect of different Distance Measures in K-means Clustering Algorithm. GLS KALP-Journal of Multidisciplinary Studies. 2021;1(3):49–56. Available from: http://glskalp.in/index.php/glskalp/article/download/15/11
  12. 11p, Cortez A, Cerdeira F, Almeida T, Matos J, Reis. 11P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. 2009. Available from: http://www3.dsi.uminho.pt/pcortez/wine/

Copyright

© 2022 Jambudi & Gandhi. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Published By Indian Society for Education and Environment (iSee)

DON'T MISS OUT!

Subscribe now for latest articles and news.