【问题标题】:Is there a way in ElasticSearch to get the shortest (closest) word at top?ElasticSearch 中有没有办法在顶部获取最短(最近)的单词?
【发布时间】:2018-10-12 22:39:09
【问题描述】:

我的索引中有这样的话:“Kem, Kemi, Kemah, Kemer, Kemerburgaz, Kemang, Kembs, Kemnay, Kempley, Kempsey, Kemerovo”。

当我搜索“Kem”时,我希望“Kemi”排在最前面,因为它是最接近的词。 (凯姆 + 我 = 凯米)。但它并没有按照我想要的方式进行。

索引:

{
"settings": {
    "number_of_shards": 1,
    "analysis": {
    "filter": {
        "autocomplete_filter": {
        "type": "edge_ngram",
        "min_gram": 2,
        "max_gram": 15
        }
    },
    "analyzer": {
        "autocomplete": { 
        "type": "custom",
        "tokenizer": "standard",
        "filter": [
            "lowercase",
            "autocomplete_filter"
        ]
        }
    }
    }
},
"mappings": {
    "_doc": {
    "properties": {
        "name": {
            "fields": {
                "keyword": {
                    "type": "keyword"
                }
            },
        "type": "text",
        "similarity": "classic",
        "analyzer": "autocomplete", 
        "search_analyzer": "standard" 
        },
        "id": {
            "type": "keyword"
        },
        "slug": {
            "type": "keyword"
        },
        "type": {
            "type": "keyword"
        }
    }
    }
}
}

查询:

{
"from" : 0, "size" : 10,
"query": {
    "bool": {
    "must": [
        {
        "match": {
            "name": "Kem"
        }
        }
    ],
    "should": [
        {
        "term": {
            "name.keyword": {
            "value": "Kem"            
            }
        }
        }
    ]
    }
}
}
'

结果:

{
"took" : 6,
"timed_out" : false,
"_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
},
"hits" : {
    "total" : 143,
    "max_score" : 20.795834,
    "hits" : [
    {
        "_index" : "destinations",
        "_type" : "_doc",
        "_id" : "lPL8Y2YBqxTX_xwrZlGc",
        "_score" : 20.795834,
        "_source" : {
        "id" : "c6317201",
        "name" : "Kem",
        "slug" : "yurtdisi/karelya-cumhuriyeti/kem"
        }
    },
    {
        "_index" : "destinations",
        "_type" : "_doc",
        "_id" : "se78Y2YBqxTX_xwrVFIU",
        "_score" : 8.61574,
        "_source" : {
        "id" : "c121023",
        "name" : "Kemah",
        "slug" : "yurtdisi/houston-ve-civari/kemah"
        }
    },
    {
        "_index" : "destinations",
        "_type" : "_doc",
        "_id" : "ze78Y2YBqxTX_xwrVFo5",
        "_score" : 8.61574,
        "_source" : {
        "id" : "c1783",
        "name" : "Kemerovo",
        "slug" : "yurtdisi/kemerovo-oblasti/kemerovo"
        }
    },
    {
        "_index" : "destinations",
        "_type" : "_doc",
        "_id" : "xe78Y2YBqxTX_xwrVFs9",
        "_score" : 8.61574,
        "_source" : {
        "id" : "c1786",
        "name" : "Kemi",
        "slug" : "yurtdisi/rovaniemi/kemi"
        }
    },
    {
        "_index" : "destinations",
        "_type" : "_doc",
        "_id" : "Tu78Y2YBqxTX_xwrVG-X",
        "_score" : 8.61574,
        "_source" : {
        "id" : "c1900",
        "name" : "Kempsey",
        "slug" : "yurtdisi/new-south-wales/kempsey"
        }
    },
    {
        "_index" : "destinations",
        "_type" : "_doc",
        "_id" : "Bu78Y2YBqxTX_xwrVILt",
        "_score" : 8.61574,
        "_source" : {
        "id" : "c3000010982",
        "name" : "Kempley",
        "slug" : "yurtdisi/dymock/kempley"
        }
    },
    {
        "_index" : "destinations",
        "_type" : "_doc",
        "_id" : "B-78Y2YBqxTX_xwrVILt",
        "_score" : 8.61574,
        "_source" : {
        "id" : "c3000010983",
        "name" : "Kemnay",
        "slug" : "yurtdisi/inverurie/kemnay"
        }
    },
    {
        "_index" : "destinations",
        "_type" : "_doc",
        "_id" : "CO78Y2YBqxTX_xwrVIb_",
        "_score" : 8.61574,
        "_source" : {
        "id" : "c3000013079",
        "name" : "Kemerburgaz",
        "slug" : "eyup/kemerburgaz"
        }
    },
    {
        "_index" : "destinations",
        "_type" : "_doc",
        "_id" : "-fL8Y2YBqxTX_xwrZQxf",
        "_score" : 8.61574,
        "_source" : {
        "id" : "c6190744",
        "name" : "Kembs",
        "slug" : "yurtdisi/haut-rhin-bolge/kembs"
        }
    },
    {
        "_index" : "destinations",
        "_type" : "_doc",
        "_id" : "xfL8Y2YBqxTX_xwrZSG-",
        "_score" : 8.61574,
        "_source" : {
        "id" : "c6216986",
        "name" : "Kemang",
        "slug" : "yurtdisi/cakarta/kemang"
        }
    }
    ]
}
}

现在他们得分相同,因为我猜每个人都有“Kem”。但是如果我做“匹配”或“匹配短语”,结果是一样的。

【问题讨论】:

  • 您应该使用您发送的实际查询来扩充您的问题,这样可以让人们更容易提供提示和解决方案。
  • 据我所知,默认情况下,match 查询不会匹配KemKemerburgaz,除非您修改默认模糊度,然后它确实是一个模糊匹配: Fuzzy matching should not be used for scoring purposes—only to widen the net of matching terms in case there are misspellings.
  • 添加了索引/映射供您查看。它是如何模糊匹配的?我没有指定任何模糊性。
  • 您正在使用 ngram 过滤器(顺便说一下,您最初省略了重要信息)。这基本上会为每个单词创建一个 ngram,这意味着当您匹配 Kem 时,您实际上匹配的是 ngram 的 Kem 部分,所有以 Kem 开头的术语都将具有该部分。没有额外的得分,因为比赛是相等的。索引实际上并不知道原始单词是什么,它只知道每个索引单词有 2-15 个单词。 Kem 得分第一,因为它也匹配 term 过滤器
  • 那我应该用什么?我正在使用 edge-ngram,因为我正在使用自动完成功能。在不同的例子中会发生什么变化?什么会给最接近的单词加分?

标签: elasticsearch


【解决方案1】:

在您的示例中,您似乎希望您的结果按长度排序。您可以使用脚本来做到这一点。

POST your_index/_doc/_search
{
  "from": 0,
  "size": 10,
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "Kem"
          }
        }
      ],
      "should": [
        {
          "term": {
            "name.keyword": {
              "value": "Kem"
            }
          }
        }
      ]
    }
  },
  "sort": [
    {
      "_score": {"order": "desc"}
    },
    {
      "_script": {
        "script": "doc['name.keyword'].value.length()",
        "type": "number",
        "order": "asc"
      }
    },
    {
      "name.keyword": {"order": "asc"}
    }
  ]
}

【讨论】:

  • 根据您的错误,您似乎缺少单引号。 "script" : "doc[name.keyword].value.length()" 应该是 "script" : "doc['name.keyword'].value.length()"
  • 单引号不在错误中。但是当我执行它时它们就在那里。刚刚复制了你的文字。
  • 它可能复制了奇怪的字符?尝试删除它们并手动键入它们。如果我删除单引号,我可以复制你的错误,但是当我重新添加它们时它们会消失。
  • Tim 复制时没有错误。我检查了三次。也许你有不同的 ElasticSearch 版本?我正在使用 6.4。没有答案的类似错误在这里:
  • 像 doc[\"name.keyword\"] 一样在终端 curl 上转义。
猜你喜欢
  • 1970-01-01
  • 2017-02-25
  • 1970-01-01
  • 2015-01-18
  • 1970-01-01
  • 1970-01-01
  • 2021-08-03
  • 2021-07-30
  • 1970-01-01
相关资源
最近更新 更多