kMeans clustering is a technique which is based on dividing a set of multivariate
data into the $k$
predefined clusters. Division of data is based on maximizing the variances between
clusters and
minimizing the variances of observations in each clusters. Several algorithms were
developed for
the optimization problem mentioned above.
In this applet, a sample of data is generated for the given $n$ and $k$ where $n$ is
the number of
observations and $k$ is the predefined number of clusters. Algorithm starts with a
set of random
cluster centers. In each step, each single observation is assigned to the nearest
cluster center.
After assignment of observations, the cluster centers are updated using the
aritmetic mean of the
observations belonging to the corresponding cluster. The update process is performed
until the cluster
centers do not change.
In the applet, as you drag any observation on the screen, you can observe the
changing of
cluster centers simultaneously.
