在散点图中为每个类绘制不同的集群标记答案

【问题标题】：Plotting different clusters markers for every class in scatter plot在散点图中为每个类绘制不同的集群标记
【发布时间】：2020-11-03 06:19:35
【问题描述】：

我有一个散点图，我正在绘制 14 个集群，但每 2 个集群属于同一类，它们都使用相同的标记。每 50 行是一个簇，每 100 行是同一类的两个簇。我想要做的是每 2 个集群或 100 行更改标记。

Link for the Data Frame

    import pandas as pd
    import numpy as np
    from matplotlib import pyplot as plt
    from matplotlib.pyplot import figure
    
    y = [0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 0,  0,  0,  0,  0,  0,
      0,  0,  0,  0, 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
      0,  0,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
      1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
      1,  1,  1,  1,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,
      2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,
      2,  2,  2,  2,  2,  2,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,
      3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,
      3,  3,  3,  3,  3,  3,  3,  3,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,
      4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,
      4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,
      5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,
      5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,
      6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,
      6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,
      7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,
      7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  8,  8,  8,  8,  8,  8,  8,  8,
      8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,
      8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  9,  9,  9,  9,  9,  9,
      9,  9,  9,  9,  9,  9,  9,  9,  9,  9,  9,  9,  9,  9,  9,  9,  9,  9,  9,  9,  9,  9,  9,  9,
      9,  9,  9,  9,  9,  9,  9,  9,  9,  9,  9,  9,  9,  9,  9,  9,  9,  9,  9,  9, 10, 10, 10, 10,
     10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
     10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 11, 11,
     11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
     11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
     12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
     12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
     12, 12, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
     13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
     13, 13, 13, 13]
    X_lda = pd.read_pickle('lda_values')
    X_lda = np.asarray(X_lda)
    
    

markers = ['x', 'o', '1', '.', '2', '>', 'D']
color=['b','r'] 
X_lda_colors=  [ color[i] for i in list(np.array(y)%2) ]
X_lda_markers = [markers[i] for i in list(np.array(y)%2)] 
plt.xlabel('1-eigenvector')
plt.ylabel('2-eigenvector')
plt.scatter(
    X_lda[:,0],
    X_lda[:,1],
    marker = X_lda_markers,
    c=X_lda_colors,
    cmap='rainbow',
    alpha=0.7,
)

这是我可以从包含大量数据的大型代码中获得的最小可重现示例。

这是实际的情节：

这就是我想要实现的。

我得到的错误：

ValueError: Unrecognized marker style ['x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x'...]'

【问题讨论】：

标记中缺少逗号 = ['x', 'o', '1', '.', '2' '>', 'D'], 在 '2' 和 ' 之间>'
我修正了逗号，感谢您的提醒。

标签： python matplotlib cluster-analysis scatter-plot

【解决方案1】：

请试试这个：

import numpy as np
from matplotlib import pyplot as plt

X_lda=np.array([[1,2],[1,1],[3,3],[4,4],[2,4],[3,5],[3,4],[3,2]]) # suppose you want to plot X

y=[0,1,1,1,2,3,4,4] # the cluster of each sample in X_lda 

color=['b','r'] 
markers = ['x', 'o', '1', '.', '2', '>', 'D'] # marker
X_lda_colors=  [ color[i] for i in list(np.array(y)%2) ] 
X_lda_markers= [ markers[i] for i in list(np.array(y)%2) ] 
plt.xlabel('1-eigenvector')
plt.ylabel('2-eigenvector')

for i in range(X_lda.shape[0]):
    plt.scatter( X_lda[i,0],    X_lda[i,1],    c=X_lda_colors[i],
    marker=X_lda_markers[i],    cmap='rainbow',   alpha=0.7,     edgecolors='w')
plt.show()

【讨论】：

在我将 X_lda_markers= [ markers[i] for i in list(np.array(y)%2) ] 更改为 %7 后它工作得很好，谢谢！