十九、聚类算法-Kmeans

本节内容：

KMEANS算法概述
KMEANS工作流程
KMEANS迭代可视化展示
使用Kmeans进行图像压缩

1、KMEANS算法概述

2、KMEANS工作流程：

假设k=2,分为两簇，

①先随机选取两个点作为质心；（初始值的选取很重要，进行多次k均值，看初值，在取平均）

②再计算每个样本点到质心的距离，选择距离短的质心作为一类；

③质心进行重新定位（向量各维取平均）；

④重新计算每个样本点到新质心的距离，再对每个样本进行遍历，看其归属于哪一类；

不断重复以上步骤直到所有样本点不在发生变化。

3、KMEANS迭代可视化展示

可视化：https://www.naftaliharris.com/blog/visualizing-dbscan-clustering/ https://www.naftaliharris.com/blog/visualizing-k-means-clustering/

4、使用Kmeans进行图像压缩

#导包
from skimage import io
from sklearn.cluster import KMeans
import numpy as np 

#读取图片 
image = io.imread("face.jpg") 

#显示图片
io.imshow(image) 

#获取图片压缩前的信息
print (\'image的类型：\', type(image))  #numpy.ndarray类型
print (\'image的尺寸：\', image.shape)  #显示尺寸
print (\'image的宽度：\', image.shape[0])  #图片宽度 w
print (\'image的高度：\', image.shape[1])  #图片高度 h
print (\'image的通道数：\', image.shape[2])  #图片通道数 c
print (\'image的总像素个数：\', image.size)   #显示总像素个数
print (\'image的最大像素值：\', image.max())  #最大像素值
print (\'image的最小像素值：\', image.min())  #最小像素值
print (\'image的像素平均值：\', image.mean()) #像素平均值 

#rows*cols*channel = 482*500*3 
rows = image.shape[0]
cols = image.shape[1]
channel = image.shape[2] 

#样本数*channel = 241000*3
#每个样本在不同通道都有存在1个点，即此时每个样本对应于3个点
image = image.reshape(image.shape[0] * image.shape[1], channel) 

#样本数*channel = 241000*1
#使用Kmeans算法将3通道变为1通道（将原来很多的颜色用少量的颜色来表示），即此时每个样本只存在1个点
#n_clusters：K值，把集合分成K个簇；n_init：指定CPU个数；max_iter：最大的迭代次数
kmeans = KMeans(n_clusters=128, n_init=10, max_iter=200)  
kmeans.fit(image)

clusters = np.asarray(kmeans.cluster_centers_, dtype=np.uint8)
#labels_：每个点的标签
labels = np.asarray(kmeans.labels_, dtype=np.uint8) 
#rows*cols*channel = 482*500*1
labels = labels.reshape(rows, cols) 
np.save(\'codebook_test.npy\', clusters)
#保存压缩后的图片
io.imsave(\'compressed_test.jpg\', labels)

运行结果：

使用K-means算法前：

4、使用Kmeans进行图像压缩

#读取压缩后的图片
newimage = io.imread("compressed_test.jpg") 
#获取图片压缩后的信息
print (\'newimage的类型：\', type(newimage))  #numpy.ndarray类型
print (\'newimage的尺寸：\', newimage.shape)  #显示尺寸
print (\'newimage的宽度：\', newimage.shape[0])  #图片宽度
print (\'newimage的高度：\', newimage.shape[1])  #图片高度
#print (\'newimage的通道数：\', newimage.shape[2])  #图片通道数
print (\'newimage的总像素个数：\', newimage.size)   #显示总像素个数
print (\'newimage的最大像素值：\', newimage.max())  #最大像素值
print (\'newimage的最小像素值：\', newimage.min())  #最小像素值
print (\'newimage的像素平均值：\', newimage.mean()) #像素平均值 
#显示压缩后的图片
io.imshow(newimage)

使用K-means算法后：