【问题标题】:How to calculate Silhouette coefficient for k-mediod clustering using pyclustering lib?如何使用 pyclustering lib 计算 k-mediod 聚类的轮廓系数?
【发布时间】:2019-01-03 12:44:59
【问题描述】:
【问题讨论】:
标签:
python
scikit-learn
cluster-analysis
sklearn-pandas
【解决方案1】:
从 0.8.2 开始也可以通过 pyclustering,这是文档中的一个示例:
from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer
from pyclustering.cluster.kmeans import kmeans
from pyclustering.cluster.silhouette import silhouette
from pyclustering.samples.definitions import SIMPLE_SAMPLES
from pyclustering.utils import read_sample
# Read data 'SampleSimple3' from Simple Sample collection.
sample = read_sample(SIMPLE_SAMPLES.SAMPLE_SIMPLE3)
# Prepare initial centers
centers = kmeans_plusplus_initializer(sample, 4).initialize()
# Perform cluster analysis
kmeans_instance = kmeans(sample, centers)
kmeans_instance.process();
clusters = kmeans_instance.get_clusters()
# Calculate Silhouette score
score = silhouette(sample, clusters).process().get_score()
如果是 PAM,您需要更改最后一部分:
...
medoids = kmeans_plusplus_initializer(sample, 4).initialize(return_index=True)
kmedoids_instance = kmedoids(sample, medoids)
clusters = kmedoids_instance.process().get_clusters()
score = silhouette(sample, clusters).process().get_score()
【解决方案2】:
从documentation,您可以使用sklearn.metrics.silhouette_score(X, labels, metric=’euclidean’, sample_size=None, random_state=None, **kwds)。此函数返回所有样本的平均轮廓系数。要获取每个样本的值,请使用silhouette_samples。我也推荐看看这个vignette。里面有一个很好的例子供你测试。