如何在句子嵌入上应用聚类？答案

【问题标题】：How to apply clustering on sentences embeddings?如何在句子嵌入上应用聚类？
【发布时间】：2019-07-24 11:41:38
【问题描述】：

我想创建一个包含原始文档要点的摘要。为此，我使用通用句子编码器 (https://tfhub.dev/google/universal-sentence-encoder/2) 制作了句子嵌入。之后，我想对我的向量应用聚类。

我已经尝试使用库sklearn：

import numpy as np
from sklearn.cluster import KMeans

n_clusters = np.ceil(len(encoded)**0.5)
kmeans = KMeans(n_clusters=n_clusters)
kmeans = kmeans.fit(encoded)

但我收到一条错误消息：

'numpy.float64' object cannot be interpreted as an integer'

【问题讨论】：

可能对stackoverflow.com/a/24003477/3514144有帮助
谢谢@AjayPandya，但我还有其他错误消息，例如“只有 size-1 数组可以转换为 Python 标量”
您可以像 kmeans.astype(int) 一样使用更多信息，请阅读此答案stackoverflow.com/a/36680545/3514144 :)

标签： cluster-analysis summarization sentence-similarity

【解决方案1】：

问题出在这一行：

n_clusters = np.ceil(len(encoded)**0.5)

kmeans 期望收到integer 作为集群的数量，因此只需添加：

n_clusters = int(np.ceil(len(encoded)**0.5))

【讨论】：