k-means (k-均值)

1.Partitioning：
Construct k partitions and iteratively update the partitions
分区：构造k个分区并迭代地更新这些分区
（1）k-means(k-均值)
（2） k-medoids(k-中心点)

2.Hierarchical：（层次聚类）
Create a hierarchy of clusters (dendrogram 树状图)
（1）Agglomerative clustering (bottom-up)
（2）Conglomerative clustering (top-down)

3.Graph-based clustering：
Graph-cut algorithms (Spectral Clustering)

4.Model-based clustering
Mixture of Gaussians

5.Other types:
（1）Non-parametric Bayesian (Latent Dirichlet Allocation)
（2）Expectation Maximisation (EM) algorithm
（3）and many more …

聚类是一个将数据集中在某些方面相似的数据成员进行分类组织的过程，聚类就是一种发现这种内在结构的技术，是无监督学习

给无标签的数据 (或者称为 instance) 分类

宏观来说这个算法是用来干什么的：

所以，这个算法要做的就是怎么通过训练数据集找到这些分类的中心点
k-means (k-均值)

我们要做的就是最小化下面的那个公式
公式的意义：假如所有的点都被分到了正确的类中，那么所有的点到他们所归属的类中心的距离之和是最小的
k-means (k-均值)

k-means (k-均值)
翻译：

Input
- cluster 的个数k
- 含有N 个instances的数据集 S = {x1, …, xN}，每个instance都是 d 维的实数向量(xi ∈ Rd)，代表instance的d个特征(features)

（聚类效果的评价方式大体上可分为性能度量和距离计算两类。）
1.purity(纯度)
k-means (k-均值)
栗子：

2.NMI ： Normalised Mutual Information (归一化互信息)

3.Rand index (RI)