Python - 有效地找到 n 个最近的向量答案

【问题标题】：Python - Efficiently find n nearest vectorsPython - 有效地找到 n 个最近的向量
【发布时间】：2018-09-16 18:44:23
【问题描述】：

我正在尝试编写一个 Python 方法，以根据它们各自的嵌入向量有效地将 n 个最接近的单词返回给给定单词。每个向量有 200 个维度，有几百万个。

这就是我目前所拥有的，它只是对目标单词和其他所有单词进行余弦相似度比较。这非常非常慢：

def n_nearest_words(word, n, word_vectors):
    """
    Return a list of the n nearest words to param word, based on cosine similarity
    param word_vectors: dict, keys are words and values are vectors
    """
    # get_word_vector() finds the word in the word_vectors dict, using a number of
    # possible capitalizations. Returns None if not found
    word_vector = get_word_vector(word, word_vectors)
    if word_vector:
        word_vector = word_vector.reshape((1, -1))
        sorted_by_sim = sorted(
            word_vectors.keys(),
            key=lambda other_word: cosine_similarity(word_vector, word_vectors[other_word].reshape((1, -1))),
            reverse=True)
        return sorted_by_sim[1:n + 1] # ignore first item, which should be target word itself
    return list()

有人有更好的建议吗？

【问题讨论】：

标签： python vector similarity cosine-similarity word-embedding

【解决方案1】：

也许尝试将两个单词之间的距离存储在一个字典中，这样您就可以在看过一次之后查找单词。

【讨论】：