【问题标题】:How to plot clusters and centers from a multi-feature kmeans model, with Matplotlib?如何使用 Matplotlib 从多特征 kmeans 模型中绘制集群和中心?
【发布时间】:2020-12-07 20:06:43
【问题描述】:

我使用kmeans 算法来确定我的数据集中的集群数量。在下面的代码中,您可以看到我有多个特征,有些是分类的,有些不是。我对它们进行了编码和缩放,得到了我的最佳集群数量。

您可以从这里下载数据: https://www.sendspace.com/file/1cnbji

import sklearn.metrics as sm

from sklearn.preprocessing import scale

from sklearn.preprocessing import Normalizer
from sklearn.preprocessing import StandardScaler, MinMaxScaler

from sklearn.cluster import KMeans, SpectralClustering, MiniBatchKMeans
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder

import matplotlib.pyplot as plt

import pandas as pd



df = pd.read_csv('dataset.csv')
print(df.columns)

features = df[['parcela', 'bruto', 'neto',
               'osnova', 'sipovi', 'nadzemno',
               'podzemno', 'tavanica', 'fasada']]

trans = ColumnTransformer(transformers=[('onehot', OneHotEncoder(), ['tavanica', 'fasada']),
                                        ('StandardScaler', Normalizer(), ['parcela', 'bruto', 'neto', 'osnova', 'nadzemno', 'podzemno', 'sipovi'])],
                          remainder='passthrough') # Default is to drop untransformed columns

features = trans.fit_transform(features)

Sum_of_squared_distances = []
for i in range(1,19):

     kmeans = KMeans(n_clusters = i, init = 'k-means++', random_state = 0)
     kmeans.fit(features)
     Sum_of_squared_distances.append(kmeans.inertia_)


plt.plot(range(1,19), Sum_of_squared_distances, 'bx-')
plt.xlabel('k')
plt.ylabel('Sum_of_squared_distances')
plt.title('Elbow Method For Optimal k')
plt.show()

  • 在图表上,肘部方法显示我的最佳聚类数为 7。
  • 如何绘制这 7 个集群?
    • 我想在图表上查看质心,并使用 7 种不同颜色的聚类散点图。

【问题讨论】:

    标签: python matplotlib scikit-learn cluster-analysis k-means


    【解决方案1】:
    • 给定Plot: kmeans clustering centroid,其中centers 是一维。 centers 数组具有 (3, 2) 形状,x(3, 1)y(3, 1)
      • 针对这一一维中心演示的方法已适用于为该问题的模型生成的七维中心生成解决方案。
    • 本题模型返回的centers有7个维度,形状为(7, 14),其中14是7组xy值。
    • 此解决方案回答了问题,如何绘制聚类和中心?
    # uses the imports as shown in the question
    from matplotlib.patches import Rectangle, Patch  # for creating a legend
    from matplotlib.lines import Line2D
    
    # beginning with 
    features = trans.fit_transform(features)
    
    # create the model and fit it to features
    kmeans_model2 = KMeans(n_clusters=7, init='k-means++', random_state=0).fit(features)
    
    # find the centers; there are 7
    centers = np.array(kmeans_model2.cluster_centers_)
    
    # unique markers for the labels
    markers = ['o', 'v', 's', '*', 'p', 'd', 'h']
    
    # get the model labels
    labels = kmeans_model2.labels_
    labels_unique = set(labels)
    
    # unique colors for each label
    colors = sns.color_palette('husl', n_colors=len(labels_unique))
    
    # color map with labels and colors
    cmap = dict(zip(labels_unique, colors))
    
    # plot
    # iterate through each group of 2 centers
    for j in range(0, len(centers)*2, 2):
        plt.figure(figsize=(6, 6))
        
        x_features = features[:, j]
        y_features = features[:, j+1]
        x_centers = centers[:, j]
        y_centers = centers[:, j+1]
        
        # add the data for each label to the plot
        for i, l in enumerate(labels):
    #         print(f'Label: {l}')  # uncomment as needed
    #         print(f'feature x coordinates for label:\n{x_features[i]}')  # uncomment as needed
    #         print(f'feature y coordinates for label:\n{y_features[i]}')  # uncomment as needed
            plt.plot(x_features[i], y_features[i], color=colors[l], marker=markers[l], alpha=0.5)
    
        # print values for given plot, rounded for easier interpretation; all 4 can be commented out
        print(f'feature labels:\n{list(labels)}')
        print(f'x_features:\n{list(map(lambda x: round(x, 3), x_features))}')
        print(f'y_features:\n{list(map(lambda x: round(x, 3), y_features))}')
        print(f'x_centers:\n{list(map(lambda x: round(x, 3), x_centers))}')
        print(f'y_centers:\n{list(map(lambda x: round(x, 3), y_centers))}')
        
        # add the centers
        # this loop is to color the center marker to correspond to the color of the corresponding label.
        for k in range(len(centers)):  
            plt.scatter(x_centers[k], y_centers[k], marker="X", color=colors[k])
        
        # title
        plt.title(f'Features: Dimension {int(j/2)}')
        
        # create the rectangles for the legend
        patches = [Patch(color=v, label=k) for k, v in cmap.items()]
        # create centers marker for the legend
        black_x = Line2D([], [], color='k', marker='X', linestyle='None', label='centers', markersize=10)
        # add the legend
        plt.legend(title='Labels', handles=patches + [black_x], bbox_to_anchor=(1.04, 0.5), loc='center left', borderaxespad=0, fontsize=15)
        
        plt.show()
    

    绘图输出

    • 许多绘制的特征具有重叠的值和中心。
    • featurescentersxy 值已打印出来,以便更轻松地查看重叠并确认绘制的值。
      • 当不再需要时,可以将负责的 print 行注释掉或删除。

    功能 0

    feature labels:
    [6, 1, 1, 1, 5, 5, 3, 4, 1, 0, 1, 5, 5, 1, 1, 1, 1, 1, 4, 1, 2, 0, 1, 3, 3, 4, 2, 2, 4, 3, 3, 2, 6, 3, 1, 2, 4, 6, 1, 4, 4, 1, 4, 5, 3, 1, 1, 1, 1, 1, 0, 1, 5, 5, 1, 1, 3, 3, 3, 1, 3, 1, 3, 3, 0, 1, 2, 2, 2, 6]
    x_features:
    [0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0]
    y_features:
    [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0]
    x_centers:
    [1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0]
    y_centers:
    [0.0, 0.0, 1.0, 0.0, -0.0, -0.0, 1.0]
    

    功能 1

    feature labels:
    [6, 1, 1, 1, 5, 5, 3, 4, 1, 0, 1, 5, 5, 1, 1, 1, 1, 1, 4, 1, 2, 0, 1, 3, 3, 4, 2, 2, 4, 3, 3, 2, 6, 3, 1, 2, 4, 6, 1, 4, 4, 1, 4, 5, 3, 1, 1, 1, 1, 1, 0, 1, 5, 5, 1, 1, 3, 3, 3, 1, 3, 1, 3, 3, 0, 1, 2, 2, 2, 6]
    x_features:
    [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]
    y_features:
    [1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0]
    x_centers:
    [1.0, -0.0, -0.0, -0.0, -0.0, 0.0, 0.0]
    y_centers:
    [0.0, 1.0, 0.0, -0.0, 0.0, 0.0, 1.0]
    

    功能 2

    feature labels:
    [6, 1, 1, 1, 5, 5, 3, 4, 1, 0, 1, 5, 5, 1, 1, 1, 1, 1, 4, 1, 2, 0, 1, 3, 3, 4, 2, 2, 4, 3, 3, 2, 6, 3, 1, 2, 4, 6, 1, 4, 4, 1, 4, 5, 3, 1, 1, 1, 1, 1, 0, 1, 5, 5, 1, 1, 3, 3, 3, 1, 3, 1, 3, 3, 0, 1, 2, 2, 2, 6]
    x_features:
    [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0]
    y_features:
    [0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
    x_centers:
    [0.0, -0.0, 0.125, 1.0, 0.0, 0.0, 0.0]
    y_centers:
    [0.0, -0.0, 0.0, 0.0, 0.0, 1.0, 0.0]
    

    功能 3

    feature labels:
    [6, 1, 1, 1, 5, 5, 3, 4, 1, 0, 1, 5, 5, 1, 1, 1, 1, 1, 4, 1, 2, 0, 1, 3, 3, 4, 2, 2, 4, 3, 3, 2, 6, 3, 1, 2, 4, 6, 1, 4, 4, 1, 4, 5, 3, 1, 1, 1, 1, 1, 0, 1, 5, 5, 1, 1, 3, 3, 3, 1, 3, 1, 3, 3, 0, 1, 2, 2, 2, 6]
    x_features:
    [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0]
    y_features:
    [0.298, 0.193, 0.18, 0.336, 0.181, 0.174, 0.197, 0.23, 0.175, 0.212, 0.196, 0.186, 0.2, 0.15, 0.141, 0.304, 0.108, 0.101, 0.304, 0.105, 0.459, 0.18, 0.16, 0.224, 0.216, 0.246, 0.139, 0.111, 0.227, 0.177, 0.159, 0.25, 0.298, 0.223, 0.335, 0.431, 0.17, 0.381, 0.255, 0.222, 0.296, 0.156, 0.202, 0.145, 0.195, 0.15, 0.141, 0.18, 0.336, 0.175, 0.212, 0.196, 0.186, 0.2, 0.15, 0.141, 0.177, 0.177, 0.177, 0.177, 0.177, 0.177, 0.224, 0.224, 0.18, 0.16, 0.222, 0.202, 0.18, 0.336]
    x_centers:
    [0.0, -0.0, 0.875, -0.0, 1.0, 0.0, 0.0]
    y_centers:
    [0.196, 0.188, 0.249, 0.196, 0.237, 0.182, 0.328]
    

    功能 4

    feature labels:
    [6, 1, 1, 1, 5, 5, 3, 4, 1, 0, 1, 5, 5, 1, 1, 1, 1, 1, 4, 1, 2, 0, 1, 3, 3, 4, 2, 2, 4, 3, 3, 2, 6, 3, 1, 2, 4, 6, 1, 4, 4, 1, 4, 5, 3, 1, 1, 1, 1, 1, 0, 1, 5, 5, 1, 1, 3, 3, 3, 1, 3, 1, 3, 3, 0, 1, 2, 2, 2, 6]
    x_features:
    [0.712, 0.741, 0.763, 0.704, 0.749, 0.741, 0.754, 0.735, 0.744, 0.738, 0.743, 0.747, 0.758, 0.759, 0.749, 0.714, 0.766, 0.748, 0.728, 0.755, 0.681, 0.752, 0.762, 0.734, 0.721, 0.747, 0.749, 0.756, 0.737, 0.748, 0.742, 0.724, 0.712, 0.733, 0.73, 0.688, 0.722, 0.705, 0.777, 0.749, 0.733, 0.744, 0.733, 0.764, 0.739, 0.76, 0.749, 0.763, 0.704, 0.744, 0.738, 0.743, 0.747, 0.758, 0.759, 0.749, 0.748, 0.748, 0.748, 0.748, 0.748, 0.748, 0.734, 0.734, 0.752, 0.762, 0.749, 0.733, 0.763, 0.704]
    y_features:
    [0.614, 0.636, 0.612, 0.601, 0.631, 0.64, 0.62, 0.624, 0.636, 0.633, 0.632, 0.63, 0.61, 0.629, 0.641, 0.616, 0.629, 0.65, 0.601, 0.644, 0.539, 0.628, 0.623, 0.627, 0.65, 0.603, 0.641, 0.641, 0.616, 0.632, 0.648, 0.631, 0.614, 0.624, 0.58, 0.562, 0.666, 0.587, 0.565, 0.616, 0.591, 0.646, 0.642, 0.625, 0.631, 0.629, 0.641, 0.612, 0.601, 0.636, 0.633, 0.632, 0.63, 0.61, 0.629, 0.641, 0.632, 0.632, 0.632, 0.632, 0.632, 0.632, 0.627, 0.627, 0.628, 0.623, 0.616, 0.642, 0.612, 0.601]
    x_centers:
    [0.745, 0.747, 0.73, 0.741, 0.735, 0.752, 0.708]
    y_centers:
    [0.63, 0.625, 0.611, 0.632, 0.62, 0.625, 0.604]
    

    功能 5

    feature labels:
    [6, 1, 1, 1, 5, 5, 3, 4, 1, 0, 1, 5, 5, 1, 1, 1, 1, 1, 4, 1, 2, 0, 1, 3, 3, 4, 2, 2, 4, 3, 3, 2, 6, 3, 1, 2, 4, 6, 1, 4, 4, 1, 4, 5, 3, 1, 1, 1, 1, 1, 0, 1, 5, 5, 1, 1, 3, 3, 3, 1, 3, 1, 3, 3, 0, 1, 2, 2, 2, 6]
    x_features:
    [0.164, 0.096, 0.103, 0.171, 0.091, 0.106, 0.094, 0.132, 0.105, 0.098, 0.102, 0.101, 0.115, 0.079, 0.095, 0.135, 0.075, 0.088, 0.126, 0.063, 0.186, 0.088, 0.075, 0.134, 0.107, 0.134, 0.09, 0.072, 0.16, 0.097, 0.073, 0.123, 0.165, 0.154, 0.133, 0.158, 0.084, 0.11, 0.105, 0.1, 0.164, 0.075, 0.1, 0.075, 0.135, 0.069, 0.095, 0.103, 0.171, 0.105, 0.098, 0.102, 0.101, 0.115, 0.079, 0.095, 0.097, 0.097, 0.097, 0.097, 0.097, 0.097, 0.134, 0.134, 0.088, 0.075, 0.1, 0.1, 0.103, 0.171]
    y_features:
    [0.001, 0.002, 0.001, 0.001, 0.001, 0.002, 0.002, 0.001, 0.001, 0.001, 0.001, 0.005, 0.002, 0.001, 0.002, 0.001, 0.002, 0.001, 0.001, 0.002, 0.0, 0.001, 0.001, 0.002, 0.0, 0.001, 0.001, 0.002, 0.002, 0.002, 0.0, 0.001, 0.001, 0.001, 0.004, 0.004, 0.001, 0.002, 0.001, 0.001, 0.002, 0.0, 0.001, 0.001, 0.001, 0.001, 0.0, 0.001, 0.001, 0.001, 0.0, 0.0, 0.003, 0.001, 0.001, 0.001, 0.001, 0.001, 0.001, 0.0, 0.002, 0.001, 0.001, 0.0, 0.001, 0.001, 0.002, 0.002, 0.002, 0.001]
    x_centers:
    [0.093, 0.1, 0.116, 0.112, 0.125, 0.101, 0.152]
    y_centers:
    [0.001, 0.001, 0.002, 0.001, 0.001, 0.002, 0.001]
    

    功能 6

    feature labels:
    [6, 1, 1, 1, 5, 5, 3, 4, 1, 0, 1, 5, 5, 1, 1, 1, 1, 1, 4, 1, 2, 0, 1, 3, 3, 4, 2, 2, 4, 3, 3, 2, 6, 3, 1, 2, 4, 6, 1, 4, 4, 1, 4, 5, 3, 1, 1, 1, 1, 1, 0, 1, 5, 5, 1, 1, 3, 3, 3, 1, 3, 1, 3, 3, 0, 1, 2, 2, 2, 6]
    x_features:
    [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.002, 0.0, 0.0, 0.001, 0.0, 0.001, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.001, 0.001, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.001, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.001, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
    y_features:
    [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.001, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
    x_centers:
    [0.0, 0.0, 0.0, 0.0, 0.0, 0.001, 0.0]
    y_centers:
    [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
    

    在一个绘图上更新所有维度

    • 根据 OP 的要求
    # plot
    plt.figure(figsize=(16, 8))
    for j in range(0, len(centers)*2, 2):
        
        x_features = features[:, j]
        y_features = features[:, j+1]
        x_centers = centers[:, j]
        y_centers = centers[:, j+1]
        
        # add the data for each label to the plot
        for i, l in enumerate(labels):
            plt.plot(x_features[i], y_features[i], marker=markers[int(j/2)], color=colors[int(j/2)], alpha=0.5)
    
        # add the centers
        for k in range(len(centers)):  
            plt.scatter(x_centers[k], y_centers[k], marker="X", color=colors[int(j/2)])
    
    # create the rectangles for the legend
    patches = [Patch(color=v, label=k) for k, v in cmap.items()]
    # create centers marker for the legend
    black_x = Line2D([], [], color='k', marker='X', linestyle='None', label='centers', markersize=10)
    # add the legend
    plt.legend(title='Labels', handles=patches + [black_x], bbox_to_anchor=(1.04, 0.5), loc='center left', borderaxespad=0, fontsize=15)
        
    plt.show()
    
    • 如个别地块所述,有很多重叠。

    【讨论】:

      猜你喜欢
      • 2019-09-14
      • 2021-08-15
      • 2016-11-09
      • 1970-01-01
      • 1970-01-01
      • 2015-03-28
      • 2022-10-14
      • 2020-06-20
      • 2019-04-04
      相关资源
      最近更新 更多