【问题标题】:ElasticSearch how to manage the score result in ngram query?ElasticSearch如何管理ngram查询中的分数结果?
【发布时间】:2021-07-14 17:20:24
【问题描述】:

我的索引中有数百种化学品结果climate_change

我正在使用 ngram 研究,这是我用于索引的设置。

{
  "settings": {
    "index.max_ngram_diff": 30,
    "index": {
      "analysis": {
        "analyzer": {
          "analyzer": {
            "tokenizer": "test_ngram",
            "filter": [
              "lowercase"
            ]
          },
          "search_analyzer": {
            "tokenizer": "test_ngram",
            "filter": [
              "lowercase"
            ]
          }
        },
        "tokenizer": {
          "test_ngram": {
            "type": "edge_ngram",
            "min_gram": 1,
            "max_gram": 30,
            "token_chars": [
              "letter",
              "digit"
            ]
          }
        }
      }
    }
  }
}

我的主要问题是,如果我尝试进行这样的查询

GET climate_change/_search?size=1000
{
  "query": {
    "match": {
      "description": {
        "query":"oxygen"
      }
    }
  }
}

我看到很多结果都有相同的分数 7.381186..但这很奇怪

     {
        "_index" : "climate_change",
        "_type" : "_doc",
        "_id" : "XXX",
        "_score" : 7.381186,
        "_source" : {
          "recordtype" : "chemicals",
          "description" : "carbon/oxygen"
        }
      },
      {
        "_index" : "climate_change",
        "_type" : "_doc",
        "_id" : "YYY",
        "_score" : 7.381186,
        "_source" : {
          "recordtype" : "chemicals",
          "description" : "oxygen"
        }

这怎么可能? 在上面的示例中,如果我使用 ngram 并且在 description 字段中搜索 oxygen,我希望第二个结果将比第一个得分更大。 我还尝试在设置中指定标记器“standard”和“whitespace”的类型,但它无济于事。 也许是描述中的“/”字符?

非常感谢!

【问题讨论】:

    标签: elasticsearch n-gram


    【解决方案1】:

    您还需要在 description 字段的映射中定义分析器。

    添加一个包含索引数据、映射、搜索查询和搜索结果的工作示例

    {
      "settings": {
        "analysis": {
          "analyzer": {
            "my_analyzer": {
              "tokenizer": "test_ngram",
              "filter": [
                "lowercase"
              ]
            },
            "search_analyzer": {
              "tokenizer": "test_ngram",
              "filter": [
                "lowercase"
              ]
            }
          },
          "tokenizer": {
            "test_ngram": {
              "type": "edge_ngram",
              "min_gram": 1,
              "max_gram": 30,
              "token_chars": [
                "letter",
                "digit"
              ]
            }
          }
        }
      },
      "mappings": {
        "properties": {
          "description": {
            "type": "text",
            "analyzer": "my_analyzer"
          }
        }
      }
    }
    

    索引数据:

    {
      "recordtype": "chemicals",
      "description": "carbon/oxygen"
    }
    {
      "recordtype": "chemicals",
      "description": "oxygen"
    }
    

    搜索查询:

    {
      "query": {
        "match": {
          "description": {
            "query":"oxygen"
          }
        }
      }
    }
    

    搜索结果:

    "hits": [
          {
            "_index": "67180160",
            "_type": "_doc",
            "_id": "2",
            "_score": 0.89246297,
            "_source": {
              "recordtype": "chemicals",
              "description": "oxygen"
            }
          },
          {
            "_index": "67180160",
            "_type": "_doc",
            "_id": "1",
            "_score": 0.6651374,
            "_source": {
              "recordtype": "chemicals",
              "description": "carbon/oxygen"
            }
          }
        ]
    

    【讨论】:

    • 完美答案!非常感谢! :)
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2015-04-09
    • 2017-07-08
    • 1970-01-01
    • 2020-12-27
    • 2018-06-01
    相关资源
    最近更新 更多