如何在 ElasticSearch 中自定义分数计算？答案

【问题标题】：How to customize score calculate in ElasticSearch?如何在 ElasticSearch 中自定义分数计算？
【发布时间】：2020-11-22 15:26:06
【问题描述】：

我有以下要求。

    {'bool':
         {'must': [
              {"terms": {"state.keyword": ["Alaska", "Alabama"]}
          ],
          'should': [
              {'match': {'abstract': 'Spill and Overfill Prevention 18 AAC 78.045'}},
              {'match': {'title': 'Spill and Overfill Prevention 18 AAC 78.045'}},
              {'constant_score': {
                  'filter': {
                      'match': {'title': 'Spill and Overfill Prevention 18 AAC 78.045'}
                  }
              }}
          ]}
     }

需要通过title（匹配）来计算分数。

为此，我尝试使用constant_score。

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-constant-score-query.html

但是，它并没有达到预期的效果。它只是将每个元素的结果恰好增加 1。

这是分析的结果

{'took': 21, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 6, 'relation': 'eq'}, 'max_score'
: 4.754379, 'hits': [{'_index': 'articles', '_type': '_doc', '_id': '483703', '_score': 4.754379, '_source':

这里是解释结果

{'_index': 'articles', '_type': '_doc', '_id': '483703', 'matched': True, 'explanation': {'value': 6.6602507, 'description': 'sum of:', 'details': [{'value': 0.150
05009, 'description': 'weight(legal_language:and in 2) [PerFieldSimilarity], result of:', 'details': [{'value': 0.15005009, 'description': 'score(freq=14.0), compu
ted as boost * idf * tf from:', 'details': [{'value': 2.2, 'description': 'boost', 'details': []}, {'value': 0.074107975, 'description': 'idf, computed as log(1 +
(N - n + 0.5) / (n + 0.5)) from:', 'details': [{'value': 6, 'description': 'n, number of documents containing term', 'details': []}, {'value': 6, 'description': 'N
, total number of documents with field', 'details': []}]}, {'value': 0.92034066, 'description': 'tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from
:', 'details': [{'value': 14.0, 'description': 'freq, occurrences of term within document', 'details': []}, {'value': 1.2, 'description': 'k1, term saturation para
meter', 'details': []}, {'value': 0.75, 'description': 'b, length normalization parameter', 'details': []}, {'value': 504.0, 'description': 'dl, length of field (a
pproximate)', 'details': []}, {'value': 497.5, 'description': 'avgdl, average length of field', 'details': []}]}]}]}, {'value': 0.3779109, 'description': 'weight(l
egal_language:18 in 2) [PerFieldSimilarity], result of:', 'details': [{'value': 0.3779109, 'description': 'score(freq=3.0), computed as boost * idf * tf from:', 'd
etails': [{'value': 2.2, 'description': 'boost', 'details': []}, {'value': 0.24116206, 'description': 'idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:',
'details': [{'value': 5, 'description': 'n, number of documents containing term', 'details': []}, {'value': 6, 'description': 'N, total number of documents with fi
eld', 'details': []}]}, {'value': 0.7122915, 'description': 'tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:', 'details': [{'value': 3.0, 'desc
ription': 'freq, occurrences of term within document', 'details': []}, {'value': 1.2, 'description': 'k1, term saturation parameter', 'details': []}, {'value': 0.7
5, 'description': 'b, length normalization parameter', 'details': []}, {'value': 504.0, 'description': 'dl, length of field (approximate)', 'details': []}, {'value
': 497.5, 'description': 'avgdl, average length of field', 'details': []}]}]}]}, {'value': 0.3779109, 'description': 'weight(legal_language:aac in 2) [PerFieldSimi
larity], result of:', 'details': [{'value': 0.3779109, 'description': 'score(freq=3.0), computed as boost * idf * tf from:', 'details': [{'value': 2.2, 'descriptio
n': 'boost', 'details': []}, {'value': 0.24116206, 'description': 'idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:', 'details': [{'value': 5, 'descriptio
n': 'n, number of documents containing term', 'details': []}, {'value': 6, 'description': 'N, total number of documents with field', 'details': []}]}, {'value': 0.
7122915, 'description': 'tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:', 'details': [{'value': 3.0, 'description': 'freq, occurrences of term
within document', 'details': []}, {'value': 1.2, 'description': 'k1, term saturation parameter', 'details': []}, {'value': 0.75, 'description': 'b, length normali
zation parameter', 'details': []}, {'value': 504.0, 'description': 'dl, length of field (approximate)', 'details': []}, {'value': 497.5, 'description': 'avgdl, ave
rage length of field', 'details': []}]}]}]}, {'value': 1.0089812, 'description': 'weight(title:spill in 2) [PerFieldSimilarity], result of:', 'details': [{'value':
1.0089812, 'description': 'score(freq=1.0), computed as boost * idf * tf from:', 'details': [{'value': 2.2, 'description': 'boost', 'details': []}, {'value': 1.02
96195, 'description': 'idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:', 'details': [{'value': 2, 'description': 'n, number of documents containing term'
, 'details': []}, {'value': 6, 'description': 'N, total number of documents with field', 'details': []}]}, {'value': 0.44543427, 'description': 'tf, computed as fr
eq / (freq + k1 * (1 - b + b * dl / avgdl)) from:', 'details': [{'value': 1.0, 'description': 'freq, occurrences of term within document', 'details': []}, {'value'
: 1.2, 'description': 'k1, term saturation parameter', 'details': []}, {'value': 0.75, 'description': 'b, length normalization parameter', 'details': []}, {'value'
: 7.0, 'description': 'dl, length of field', 'details': []}, {'value': 6.6666665, 'description': 'avgdl, average length of field', 'details': []}]}]}]}, {'value':
0.072622515, 'description': 'weight(title:and in 2) [PerFieldSimilarity], result of:', 'details': [{'value': 0.072622515, 'description': 'score(freq=1.0), computed
as boost * idf * tf from:', 'details': [{'value': 2.2, 'description': 'boost', 'details': []}, {'value': 0.074107975, 'description': 'idf, computed as log(1 + (N
- n + 0.5) / (n + 0.5)) from:', 'details': [{'value': 6, 'description': 'n, number of documents containing term', 'details': []}, {'value': 6, 'description': 'N, t
otal number of documents with field', 'details': []}]}, {'value': 0.44543427, 'description': 'tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:',
'details': [{'value': 1.0, 'description': 'freq, occurrences of term within document', 'details': []}, {'value': 1.2, 'description': 'k1, term saturation paramete
r', 'details': []}, {'value': 0.75, 'description': 'b, length normalization parameter', 'details': []}, {'value': 7.0, 'description': 'dl, length of field', 'detai
ls': []}, {'value': 6.6666665, 'description': 'avgdl, average length of field', 'details': []}]}]}]}, {'value': 1.0089812, 'description': 'weight(title:overfill in
2) [PerFieldSimilarity], result of:', 'details': [{'value': 1.0089812, 'description': 'score(freq=1.0), computed as boost * idf * tf from:', 'details': [{'value':
2.2, 'description': 'boost', 'details': []}, {'value': 1.0296195, 'description': 'idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:', 'details': [{'value'
: 2, 'description': 'n, number of documents containing term', 'details': []}, {'value': 6, 'description': 'N, total number of documents with field', 'details': []}
]}, {'value': 0.44543427, 'description': 'tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:', 'details': [{'value': 1.0, 'description': 'freq, oc
currences of term within document', 'details': []}, {'value': 1.2, 'description': 'k1, term saturation parameter', 'details': []}, {'value': 0.75, 'description': '
b, length normalization parameter', 'details': []}, {'value': 7.0, 'description': 'dl, length of field', 'details': []}, {'value': 6.6666665, 'description': 'avgdl
, average length of field', 'details': []}]}]}]}, {'value': 1.0089812, 'description': 'weight(title:prevention in 2) [PerFieldSimilarity], result of:', 'details':
[{'value': 1.0089812, 'description': 'score(freq=1.0), computed as boost * idf * tf from:', 'details': [{'value': 2.2, 'description': 'boost', 'details': []}, {'va
lue': 1.0296195, 'description': 'idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:', 'details': [{'value': 2, 'description': 'n, number of documents contai
ning term', 'details': []}, {'value': 6, 'description': 'N, total number of documents with field', 'details': []}]}, {'value': 0.44543427, 'description': 'tf, comp
uted as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:', 'details': [{'value': 1.0, 'description': 'freq, occurrences of term within document', 'details': []}
, {'value': 1.2, 'description': 'k1, term saturation parameter', 'details': []}, {'value': 0.75, 'description': 'b, length normalization parameter', 'details': []}
, {'value': 7.0, 'description': 'dl, length of field', 'details': []}, {'value': 6.6666665, 'description': 'avgdl, average length of field', 'details': []}]}]}]},
{'value': 0.072622515, 'description': 'weight(title:18 in 2) [PerFieldSimilarity], result of:', 'details': [{'value': 0.072622515, 'description': 'score(freq=1.0),
computed as boost * idf * tf from:', 'details': [{'value': 2.2, 'description': 'boost', 'details': []}, {'value': 0.074107975, 'description': 'idf, computed as lo
g(1 + (N - n + 0.5) / (n + 0.5)) from:', 'details': [{'value': 6, 'description': 'n, number of documents containing term', 'details': []}, {'value': 6, 'descriptio
n': 'N, total number of documents with field', 'details': []}]}, {'value': 0.44543427, 'description': 'tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)
) from:', 'details': [{'value': 1.0, 'description': 'freq, occurrences of term within document', 'details': []}, {'value': 1.2, 'description': 'k1, term saturation
parameter', 'details': []}, {'value': 0.75, 'description': 'b, length normalization parameter', 'details': []}, {'value': 7.0, 'description': 'dl, length of field
', 'details': []}, {'value': 6.6666665, 'description': 'avgdl, average length of field', 'details': []}]}]}]}, {'value': 0.072622515, 'description': 'weight(title:
aac in 2) [PerFieldSimilarity], result of:', 'details': [{'value': 0.072622515, 'description': 'score(freq=1.0), computed as boost * idf * tf from:', 'details': [{
'value': 2.2, 'description': 'boost', 'details': []}, {'value': 0.074107975, 'description': 'idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:', 'details':
[{'value': 6, 'description': 'n, number of documents containing term', 'details': []}, {'value': 6, 'description': 'N, total number of documents with field', 'det
ails': []}]}, {'value': 0.44543427, 'description': 'tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:', 'details': [{'value': 1.0, 'description':
'freq, occurrences of term within document', 'details': []}, {'value': 1.2, 'description': 'k1, term saturation parameter', 'details': []}, {'value': 0.75, 'descr
iption': 'b, length normalization parameter', 'details': []}, {'value': 7.0, 'description': 'dl, length of field', 'details': []}, {'value': 6.6666665, 'descriptio
n': 'avgdl, average length of field', 'details': []}]}]}]}, {'value': 1.5095675, 'description': 'weight(title:78.045 in 2) [PerFieldSimilarity], result of:', 'deta
ils': [{'value': 1.5095675, 'description': 'score(freq=1.0), computed as boost * idf * tf from:', 'details': [{'value': 2.2, 'description': 'boost', 'details': []}
, {'value': 1.5404451, 'description': 'idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:', 'details': [{'value': 1, 'description': 'n, number of documents
containing term', 'details': []}, {'value': 6, 'description': 'N, total number of documents with field', 'details': []}]}, {'value': 0.44543427, 'description': 'tf
, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:', 'details': [{'value': 1.0, 'description': 'freq, occurrences of term within document', 'details
': []}, {'value': 1.2, 'description': 'k1, term saturation parameter', 'details': []}, {'value': 0.75, 'description': 'b, length normalization parameter', 'details
': []}, {'value': 7.0, 'description': 'dl, length of field', 'details': []}, {'value': 6.6666665, 'description': 'avgdl, average length of field', 'details': []}]}
]}]}, {'value': 1.0, 'description': 'ConstantScore(title.keyword:Spill and Overfill Prevention 18 AAC 78.045)', 'details': []}]}}

与script_score

{'query': {
    'function_score': {
        'query': {
            'bool': {
                'should': [ 
                    {'match': {'legal_language': 'inspections and testing 691'}},
                    {'match': {'title': 'inspections and testing 691'}}
                ]
            }
        },
        'script_score': {
            'script': {'source': "doc['title'].value"}
        }
    }
}}

映射

{
  "articles" : {
    "mappings" : {
      "properties" : {
        "abstract" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "categories" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "cfr40_part280" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "citation" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "effective_date" : {
          "type" : "date"
        },
        "id" : {
          "type" : "long"
        },
        "legal_language" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "local_regulation" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "reference_images" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "state" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "tags" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "title" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "unique_id" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

追溯

> Traceback (most recent call last):   File
> "D:\work_projects\dewey_project\webapp\articles\services\elasticsearch_service.py",
> line 103, in retrieve_articles
>     result = current_app.elasticsearch.search(   File "d:\work_projects\dewey_project\venv\lib\site-packages\elasticsearch\client\utils.py",
> line 84, in _wrapped
>     return func(*args, params=params, **kwargs)   File "d:\work_projects\dewey_project\venv\lib\site-packages\elasticsearch\client\__init__.py",
> line 1547, in search
>     return self.transport.perform_request(   File "d:\work_projects\dewey_project\venv\lib\site-packages\elasticsearch\transport.py",
> line 351, in perform_request
>     status, headers_response, data = connection.perform_request(   File
> "d:\work_projects\dewey_project\venv\lib\site-packages\elasticsearch\connection\http_urllib3.py",
> line 261, in perform_request
>     self._raise_error(response.status, raw_data)   File "d:\work_projects\dewey_project\venv\lib\site-packages\elasticsearch\connection\base.py",
> line 181, in _raise_error
>     raise HTTP_EXCEPTIONS.get(status_code, TransportError)( elasticsearch.exceptions.RequestError: RequestError(400,
> 'search_phase_execution_exception', 'runtime error')

【问题讨论】：

只是澄清一些事情，如果标题匹配，您是否试图为文档提供更多相关性（分数）？
@Sharmiko not..
我不明白你想达到什么目的，如果你想增加其他值，你必须指定 boost 参数。如果这不是您想要做的，如果您能更具体地解释它会有所帮助。
@Sharmiko 我只需要计算分数"match": {"title": text}，不是整个查询（默认）。

标签： python python-3.x elasticsearch

【解决方案1】：

不太清楚您要达到什么目标，但您似乎希望仅根据 title 字段上的匹配项接收文档的 TF/IDF 分数。而且您还想为您的查询添加额外的约束。如果是这样，您应该对bool 查询使用filter 子句。他们不会修改您的分数，但会根据其中的匹配过滤结果。

{
    "bool": {
        "should": [
            {"match": {"title": "Spill and Overfill Prevention 18 AAC 78.045"}}
        ],
        "filter": [
            {"match": {"abstract": "Spill and Overfill Prevention 18 AAC 78.045"}},
            {"terms": {"state.keyword": ["Alaska", "Alabama"]}
        ]
    }
}

这将返回与原始查询稍有不同的结果，因为它需要与查询 Spill and Overfill Prevention 18 AAC 78.045 上的 abstract 字段匹配。如果您想保留原始查询的行为，那么您应该将其作为常量分数查询移动到 should 块内

{
    "query": {
        "bool": {
            "should": [
                {"match": {"title": "Spill and Overfill Prevention 18 AAC 78.045"}},
                {"constant_score": {
                    "filter": {
                        "match": {"legal_language": "Spill and Overfill Prevention 18 AAC 78.045"}
                    }
                }}
            ],
            "filter": [
                {"terms": {"state.keyword": ["Alaska", "Alabama"]]}},
            ],
        }
    }
}

然后从结果分数中减去 1。

【讨论】：

我需要保存所有结果。可以举个第二种方式的例子吗？

【解决方案2】：

如果您需要控制评分过程，可以使用function_score 查询来自定义和替换原始查询_score。可以看function_score查询here。

【讨论】：

我试过这个没有成功。你能帮我查询一下吗？
这个查询怎么样："function_score": { "query": { [ put your query here] }, "script_score": { "script": { "source": "doc['title'].value" } } }
RequestError(400, 'search_phase_execution_exception', 'runtime error')
我认为您的查询格式不正确。确保它看起来像这样： ` { 'query': { 'function_score': { 'query': { 'bool': { [ rest of your query] } }, 'script_score': { 'script': { '来源': "doc['title'].value" } } } } }`
我不知道它为什么抱怨。也许您的数据库有问题。尝试搜索类似的问题。