【发布时间】:2019-11-27 14:05:14
【问题描述】:
我有一个文本列表,我已经执行了tfidf 和kmeans 集群,如何访问最接近kmeans 集群中心的文本。
text=['this is text one','this is text two','this is text three',
'thats are next','that are four','that are three',
'lionel messi is footbal player','kobe bryant is basket ball player',
'rossi is motogp racer']
Tfidf_vect = TfidfVectorizer(max_features=5000)
Tfidf_vect.fit(text)
cluster_text = Tfidf_vect.transform(text)
kmeans = KMeans(n_clusters=3, random_state=0,max_iter=600,n_init=10)
kmeans.fit(cluster_text)
labels = (kmeans.labels_)
center=kmeans.cluster_centers_
预期输出:
closest text to the center cluster 1=['this is text two','this is text three']
closest text to the center cluster 2=['that are three','that are four']
closest text to the center cluster 3=['rossi is motogp racer']
感谢您的帮助
【问题讨论】:
标签: python-3.x scikit-learn k-means