Elasticsearch 更像这样会返回太多文档答案

【问题标题】：Elasticsearch more like this returns too many documentsElasticsearch 更像这样会返回太多文档
【发布时间】：2017-12-30 12:02:12
【问题描述】：

我有这样的文件：

{
title:'...',
body: '...'
}

我想获取与特定文档相似度超过 90% 的文档。我用过这个查询：

query = {
    "query": {
        "more_like_this" : {
            "fields" : ["title", "body"],
            "like" : "body of another document",
            "min_term_freq" : 1,
            "max_query_terms" : 12
        }
    }
}

如何更改此查询以检查与指定文档的 90% 相似性？

【问题讨论】：

你的问题听起来很像文档中的一个例子......：“A more complicated use case consists of mixing texts with documents already existing in the index. In this case, the syntax to specify a document is similar to the one used in the Multi GET API.”。链接：elastic.co/guide/en/elasticsearch/reference/current/…

标签： elasticsearch elasticsearch-5

【解决方案1】：

看看Query Formation Parameterminimum_should_match

【讨论】：

【解决方案2】：

您应该指定 minimun_should_match

minimum_should_match

析取查询形成后，此参数控制必须匹配的词条数。语法与最小值应该匹配。（默认为“30%”）。

它使用这个形成查询

MLT 查询只是从输入文档中提取文本，分析它，通常在现场使用相同的分析仪，然后选择具有最高 tf-idf 的前 K 项形成析取查询这些术语

因此，如果您想提升您的标题字段，您应该提升您的标题字段，因为如果标题包含术语频率/反向文档频率中存在的大部分术语。结果应该得到提升，因为它具有更多的相关性。您可以将标题字段提高 1.5。

请参阅 this 文档以获取有关 more_like_this 查询的参考资料

【讨论】：