【问题标题】:Elasticsearch won't match on exact stringElasticsearch 不会匹配确切的字符串
【发布时间】:2016-05-16 16:52:52
【问题描述】:

我创建了一个包含完成提示的类别索引,但它的行为与我的预期不同。

curl -XPUT http://localhost:9200/categories/category/_mapping -d '{
    "category" : {
        "properties" : {
            "categoryDescription" : {
                "type" : "string"
            },
            "suggest" : {
                "type" : "completion",
                "analyzer" : "simple",
                "search_analyzer" : "simple",
                "payloads" : true
            }
        }
    }
}'

我有一个为“墨西哥杂货店”编入索引的类别,当我搜索该字符串时,我得到零点击,只有一个建议结果:

{
    "query":{
        "fuzzy":{
            "categoryDescription":{
                "value":"mexican grocery store"
            }
        }
    },
    "from":0,
    "size":20,
    "suggest":{
        "category-suggest":{
            "text":"mexican grocery store",
            "completion":{
                "field":"suggest","fuzzy":{"fuzziness":2}
            }
        }
    }
}

{
    "took":19,
    "timed_out":false,
    "_shards":{"total":5,"successful":5,"failed":0},
    "hits":{
        "total":0,"max_score":null,"hits":[]
    },
    "suggest":{
        "category-suggest":[
            {
                "text":"mexican grocery store",
                "offset":0,
                "length":21,
                "options":[
                    {
                        "text":"Mexican Grocery Store",
                        "score":1.0,
                        "payload":{"id":5915028960051200}
                    }
                ]
            }
        ]
    }
}

不仅精确匹配的命中率为零,而且当我输入字符串“墨西哥”时,一堆带有“医疗”一词的类别列在“墨西哥”类别之前,这些类别没有任何意义我也觉得。

{
    "query":{
        "fuzzy":{
            "categoryDescription":{
                "value":"mexican"
            }
        }
    },
    "from":0,
    "size":20,
    "suggest":{
        "category-suggest":{
            "text":"mexican",
            "completion":{
                "field":"suggest","fuzzy":{"fuzziness":2}
            }
        }
    }
}

{
    "took":11,
    "timed_out":false,
    "_shards":{
        "total":5,
        "successful":5,
        "failed":0
    },
    "hits":{
        "total":25,
        "max_score":3.8085938,
        "hits":[
            {
                "_index":"categories",
                "_type":"category",
                "_id":"4993638215974912",
                "_score":3.8085938,
                "_source":{
                    "id":4993638215974912,
                    "categoryDescription":"Medical Spa",
                    "suggest":{
                        "input":["Medical Spa"],
                        "output":"Medical Spa",
                        "payload":{"id":4993638215974912}}}},
{"_index":"categories","_type":"category","_id":"6401013099528192","_score":3.8085938,"_source":{"id":6401013099528192,"categoryDescription":"Medical School","suggest":{"input":["Medical School"],"output":"Medical School","payload":{"id":6401013099528192}}}},{"_index":"categories","_type":"category","_id":"4712163239264256","_score":3.4429123,"_source":{"id":4712163239264256,"categoryDescription":"Medical Examiner","suggest":{"input":["Medical Examiner"],"output":"Medical Examiner","payload":{"id":4712163239264256}}}},{"_index":"categories","_type":"category","_id":"5978800634462208","_score":3.4429123,"_source":{"id":5978800634462208,"categoryDescription":"Medical Center","suggest":{"input":["Medical Center"],"output":"Medical Center","payload":{"id":5978800634462208}}}},{"_index":"categories","_type":"category","_id":"5415850681040896","_score":3.4429123,"_source":{"id":5415850681040896,"categoryDescription":"Medical Clinic","suggest":{"input":["Medical Clinic"],"output":"Medical Clinic","payload":{"id":5415850681040896}}}},{"_index":"categories","_type":"category","_id":"4852900727619584","_score":2.75433,"_source":{"id":4852900727619584,"categoryDescription":"Medical Billing Service","suggest":{"input":["Medical Billing Service"],"output":"Medical Billing Service","payload":{"id":4852900727619584}}}},{"_index":"categories","_type":"category","_id":"5352079006629888","_score":2.4411354,"_source":{"id":5352079006629888,"categoryDescription":"Mexican Restaurant","suggest":{"input":["Mexican Restaurant"],"output":"Mexican Restaurant","payload":{"id":5352079006629888}}}},{"_index":"categories","_type":"category","_id":"5915028960051200","_score":2.143557,"_source":{"id":5915028960051200,"categoryDescription":"Mexican Grocery Store","suggest":{"input":["Mexican Grocery Store","shop"],"output":"Mexican Grocery Store","payload":{"id":5915028960051200}}}},{"_index":"categories","_type":"category","_id":"6392217006505984","_score":2.0527549,"_source":{"id":6392217006505984,"categoryDescription":"Latin American Restaurant","suggest":{"input":["Latin American Restaurant"],"output":"Latin American Restaurant","payload":{"id":6392217006505984}}}},{"_index":"categories","_type":"category","_id":"5149768867119104","_score":2.0527549,"_source":{"id":5149768867119104,"categoryDescription":"Occupational Medical Physician","suggest":{"input":["Occupational Medical Physician"],"output":"Occupational Medical Physician","payload":{"id":5149768867119104}}}},{"_index":"categories","_type":"category","_id":"5157465448513536","_score":2.0527549,"_source":{"id":5157465448513536,"categoryDescription":"Central American Restaurant","suggest":{"input":["Central American Restaurant"],"output":"Central American Restaurant","payload":{"id":5157465448513536}}}},{"_index":"categories","_type":"category","_id":"6479078425100288","_score":2.0527549,"_source":{"id":6479078425100288,"categoryDescription":"American Football Field","suggest":{"input":["American Football Field"],"output":"American Football Field","payload":{"id":6479078425100288}}}},{"_index":"categories","_type":"category","_id":"4789129053208576","_score":1.9529084,"_source":{"id":4789129053208576,"categoryDescription":"Mexican Goods Store","suggest":{"input":["Mexican Goods Store","shop"],"output":"Mexican Goods Store","payload":{"id":4789129053208576}}}},{"_index":"categories","_type":"category","_id":"5275113192685568","_score":1.9138902,"_source":{"id":5275113192685568,"categoryDescription":"Medical Laboratory","suggest":{"input":["Medical Laboratory"],"output":"Medical Laboratory","payload":{"id":5275113192685568}}}},{"_index":"categories","_type":"category","_id":"5838063146106880","_score":1.7436681,"_source":{"id":5838063146106880,"categoryDescription":"Medical Group","suggest":{"input":["Medical Group"],"output":"Medical Group","payload":{"id":5838063146106880}}}},{"_index":"categories","_type":"category","_id":"4649491076481024","_score":1.7436681,"_source":{"id":4649491076481024,"categoryDescription":"American Restaurant","suggest":{"input":["American Restaurant"],"output":"American Restaurant","payload":{"id":4649491076481024}}}},{"_index":"categories","_type":"category","_id":"5458456756617216","_score":1.5311122,"_source":{"id":5458456756617216,"categoryDescription":"Traditional American Restaurant","suggest":{"input":["Traditional American Restaurant"],"output":"Traditional American Restaurant","payload":{"id":5458456756617216}}}},{"_index":"categories","_type":"category","_id":"6183309797228544","_score":1.5311122,"_source":{"id":6183309797228544,"categoryDescription":"Public Medical Center","suggest":{"input":["Public Medical Center"],"output":"Public Medical Center","payload":{"id":6183309797228544}}}},{"_index":"categories","_type":"category","_id":"6706677332049920","_score":1.5311122,"_source":{"id":6706677332049920,"categoryDescription":"Native American Goods Store","suggest":{"input":["Native American Goods Store","shop"],"output":"Native American Goods Store","payload":{"id":6706677332049920}}}},{"_index":"categories","_type":"category","_id":"6119538122817536","_score":1.3949344,"_source":{"id":6119538122817536,"categoryDescription":"Medical Supply Store","suggest":{"input":["Medical Supply Store","shop"],"output":"Medical Supply Store","payload":{"id":6119538122817536}}}}]},"suggest":{"category-suggest":[{"text":"mexican","offset":0,"length":7,"options":[{"text":"Medical Billing Service","score":1.0,"payload":{"id":4852900727619584}},{"text":"Medical Center","score":1.0,"payload":{"id":5978800634462208}},{"text":"Medical Clinic","score":1.0,"payload":{"id":5415850681040896}},{"text":"Medical Examiner","score":1.0,"payload":{"id":4712163239264256}},{"text":"Medical Group","score":1.0,"payload":{"id":5838063146106880}}]}]}}

【问题讨论】:

    标签: elasticsearch autosuggest


    【解决方案1】:

    您将字段categoryDescription 索引为string,因此Elasticsearch 正在对您的输入运行其标准分析器并将Mexican Grocery Store 转换为三个标记[mexican, grocery, store]

    fuzzy 查询属于术语查询系列,也就是说,它在术语级别上运行,不通过任何分析器运行。带有输入 Mexican Grocery Store 的模糊查询将尝试将这些词作为一个词来匹配,而不是作为 3 个不同的词。它没有找到任何东西,因为完整的短语在索引中不作为一个术语存在。您可以向categoryDescription 添加一个未分析或仅使用小写标记过滤器的子字段,然后对该字段运行模糊查询以产生“完全匹配”。

    对于第二部分,模糊查询不区分已修改的匹配(应用了模糊性)和精确匹配。 在执行实际搜索之前,模糊术语在内部与给定字段中所有术语的列表进行匹配并展开。在您的示例中,它变成了类似

    "boolean": {
      "should": [
        {
          "term": {
            "categoryDescription": "medical"
          }
        },
        {
          "term": {
            "categoryDescription": "mexican"
          }
        }
      ]
    }
    

    从这里可以清楚地知道为什么会返回 Medical Spa 这样的东西。这些命中的分数也高于Mexican Grocery Store,因此它们首先被返回。我怀疑这是由于词频(Medical 出现的频率高于墨西哥),但应该在启用 explain 的情况下再次运行查询,以查看得分更高的确切原因。

    如果您想对模糊匹配应用惩罚,您可以将模糊和术语查询包装成布尔查询:

    {
      "query": {
        "boolean": {
          "should": [
            {
              "fuzzy": {
                "categoryDescription": "mexican"
              }
            },
            {
              "term": {
                "categoryDescription": "mexican"
              }
            }
          ]
        }
      }
    }
    

    这会将只有fuzzy 部分匹配的文档的分数减半(由于布尔查询的坐标因素)。

    【讨论】:

    • 哇哦!多么棒的解释。非常感谢 :) 对模糊查询如何工作的解释确实有助于我更好地理解这一切是如何工作的。从一开始是否有更好的索引设置可以帮助实现我的期望?
    • 我曾尝试使用 nGram 分析器,但遇到了问题,我有一个类别 Bar 和其他类别,其中包含 Bar 一词,例如 Barn,当用户输入“Bar " 包含 Bar 的单词首先出现在列表中,并且再次不会只显示 Bar,而是首先显示完全匹配。
    • nGram 也有类似的问题;它为BarBarn 生成相同的术语,因此无法区分这两者。以下是我的回答中的两个建议:found.no/play/gist/dba4dec7672146946de9c80fb8656e7c——如果你想知道如何实现你想要的,最好在描述你的目标的地方提出一个新问题,然后直接问这个问题。跨度>
    • 好吧,酷。感谢您提供小提琴链接!我将实施这些更改,看看效果如何。如果它仍然不是我想要的,我会的。再次感谢knutwalker! :)
    • 所以我实现了您建议的两种不同类型的查询。再次感谢您!实施后的问题。在查询一中,为什么要在 categoryDe​​scription 中添加“.lower”,而不是在查询二中。查询二将无法正常工作,正如我发现的附加说明的那样。查询二很棒,因为它正在提取所有带有“墨西哥”的类别就像你说的那样在医疗之前,但是为了完全匹配,命中又是空的。这两种解决方案有没有结合?
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-08-22
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-10-23
    相关资源
    最近更新 更多