Repository logo
  • English
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Log In
    or
    New user? Click here to register.Have you forgotten your password?
Repository logo
  • Communities & Collections
  • Research Outputs
  • Projects
  • People
  • Statistics
  • English
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Log In
    or
    New user? Click here to register.Have you forgotten your password?
  1. Home
  2. Scholalry Output
  3. Publications
  4. K-Means++ under approximation stability
 
  • Details
Options

K-Means++ under approximation stability

ISSN
03043975
Date Issued
2015-07-11
Author(s)
Agarwal, Manu
Jaiswal, Ragesh
Pal, Arindam
DOI
10.1016/j.tcs.2015.04.030
Abstract
One of the most popular algorithms for finding centers for initializing Lloyd's heuristic is the k-means++ seeding algorithm. The algorithm is a simple sampling procedure that can be described as follows: The algorithm picks the first center randomly from among the given points and then for i=2, 3, . . .. , k, picks a point to be the ith center with probability proportional to the squared Euclidean distance of this point to the nearest center out of the (i-1) previously chosen centers. The k-means+. + seeding algorithm is known to exhibit nice properties. It has been noticed that this seeding algorithm tends to perform well when the optimal clusters are separated in some sense. Intuitively, this is because the algorithm gives preference to further away points when picking centers. One separation condition that has been studied in the past was due to Ostrovsky et al. [9]. Jaiswal and Garg [8] showed that if any dataset satisfies the separation condition of [9], then this sampling algorithm gives a constant approximation with probability Ω(1k) on this dataset. Another separation condition that is strictly weaker than [9] is the approximation stability condition studied by Balcan et al. [5]. In this work, we show that the sampling algorithm gives a constant approximation with probability Ω(1k) on any dataset that satisfies the separation condition of [5] and the optimal k clusters are not too small. We give a negative result for datasets that have small optimal clusters.
Subjects
  • Approximation stabili...

  • K-means clustering

  • K-means++

Copyright © 2016-2025  Indian Institute of Technology Jodhpur

Developed and maintained by Dr. Kamlesh Patel and Team, S. R. Ranganathan Learning Hub, IIT Jodhpur.

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Cookie settings
  • Privacy policy
  • End User Agreement
  • Send Feedback