【发布时间】:2019-04-27 14:08:12
【问题描述】:
所以我有以下数据框:
id text
342 text sample
341 another text sample
343 ...
还有如下代码:
X = tfidf_vectorizer.fit_transform(df['text']).todense()
pca = PCA(n_components=2)
data2D = pca.fit_transform(X)
clusterer = KMeans(n_clusters=n_clusters), random_state=10)
cluster_labels = clusterer.fit_predict(data2D)
silhouette_avg = silhouette_score(data2D, cluster_labels)
print(silhouette_avg)
y_lower = 10
for i in range(n_clusters):
# here I would like to get the id's of each item per cluster
# so that I know which list of id's falls into which cluster
现在,我怎样才能看到哪个 id 属于哪个集群,这是可以做到的吗?为了“聚类”这些文本文档,我的方法是否正确?
请不要说我可能跳过了一些代码以保持问题简短
【问题讨论】:
标签: python-3.x k-means pca text-classification unsupervised-learning