【问题标题】:How to get a list of all leaves under a node in a dendrogram?如何获取树状图中节点下所有叶子的列表?
【发布时间】:2019-07-15 14:04:48
【问题描述】:

我使用 scipy.cluster.hierarchy.dendrogram 做了一个树状图, 使用以下生成的数据:

a = np.random.multivariate_normal([10, 0], [[3, 1], [1, 4]], size=[100,]) b = np.random.multivariate_normal([0, 20], [[3, 1], [1, 4]], size=[50,]) c = np.random.multivariate_normal([8, 2], [[3, 1], [1, 4]], size=[80,]) X = np.concatenate((a, b, c),)

创建联动函数:

from scipy.cluster.hierarchy import dendrogram, linkage Z = linkage(X, 'ward')

然后:

dendrogram( Z, truncate_mode='lastp', # show only the last p merged clusters p=5, # show only the last p merged clusters show_leaf_counts=False, # otherwise numbers in brackets are counts leaf_rotation=90., leaf_font_size=12., show_contracted=True, # to get a distribution impression in truncated branches )

现在,我的数据中总共有 230 个观察值,它们被拆分为 p=5 个集群。对于每个集群,我希望拥有其中所有观察值的所有行索引的列表。另外,我想知道这5个集群之上的层次结构。

谢谢!

【问题讨论】:

    标签: python cluster-analysis hierarchical-clustering dendrogram


    【解决方案1】:

    我是聚类和树状图的新手。如有错误欢迎指出。

    # put X in a dataframe
    df = pd.DataFrame()
    df['col1']=X[:,0]
    df['col2']=X[:,1]
    
    index=[]
    for i in range(len(X)):
        elem = 'A' + str(i)
        index.append(elem)
    
    df['index'] = index
    print(df.shape)
    df.head()
    

    Z = linkage(X, 'ward')
    
    dendrogram(
    Z,
    truncate_mode='lastp',  # show only the last p merged clusters
    p=5,  # show only the last p merged clusters
    show_leaf_counts=True,  # otherwise numbers in brackets are counts
    leaf_rotation=90.,
    leaf_font_size=12.,
    show_contracted=True,  # to get a distribution impression in truncated branches
    );
    plt.show()
    

    # retrieve elements in each cluster
    label = fcluster(Z, 5, criterion='maxclust')
    
    df_clst = pd.DataFrame()
    df_clst['index']  = df['index']
    df_clst['label']  = label
    
    # print them
    for i in range(5):
       elements = df_clst[df_clst['label']==i+1]['index'].tolist()  
       size = len(elements)
       print('\n Cluster {}: N = {}  {}'.format(i+1, size, elements))
    

    【讨论】:

      猜你喜欢
      • 2015-10-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2014-12-19
      • 1970-01-01
      • 2021-11-28
      • 2020-01-16
      • 1970-01-01
      相关资源
      最近更新 更多