A multiterm query like

GET /my_index/doc/_search
{
  "query": {
    "match": {
      "text": "quick fox"
    }
  }
}

As soon as a document matches a query, Lucene calculates its score for that query, combining the scores of each matching term. The formula used for scoring is called the practical scoring function. 

score(q,d)  =  

lucene内置的评分函数

            queryNorm(q)  

lucene内置的评分函数

          · coord(q,d)    

lucene内置的评分函数

          · ∑ (           

lucene内置的评分函数

                tf(t in d)   

lucene内置的评分函数

              · idf(t)²      

lucene内置的评分函数

              · t.getBoost() 

lucene内置的评分函数

              · norm(t,d)    

lucene内置的评分函数

            ) (t in q)    

lucene内置的评分函数

lucene内置的评分函数

score(q,d) is the relevance score of document d for query q.

lucene内置的评分函数

queryNorm(q) is the query normalization factor (new).

lucene内置的评分函数

coord(q,d) is the coordination factor (new).

lucene内置的评分函数 lucene内置的评分函数

The sum of the weights for each term t in the query q for document d.

lucene内置的评分函数

tf(t in d) is the term frequency for term t in document d.

lucene内置的评分函数

idf(t) is the inverse document frequency for term t.

lucene内置的评分函数

t.getBoost() is the boost that has been applied to the query (new).

lucene内置的评分函数

norm(t,d) is the field-length norm, combined with the index-time field-level boost, if any. (new). 官方不推荐用index-time find level

You should recognize scoretf, and idf. The queryNormcoordt.getBoost, and norm are new.

We will talk more about query-time boosting later in this chapter, but first let’s get query normalization, coordination, and index-time field-level boosting out of the way.

Query Normalization Factor

queryNorm = 1 / √sumOfSquaredWeights 
lucene内置的评分函数

lucene内置的评分函数

The sumOfSquaredWeights is calculated by adding together the IDF of each term in the query, squared.

The same query normalization factor is applied to every document, and you have no way of changing it. For all intents and purposes, it can be ignored. (每个文档都有这个因子,说明它没有什么卵用!)

Query Coordination

The coordination factor (coord) is used toThe more query terms that appear in the document, the greater the chances that the document is a good match for the query.

The coordination factor results in the document that contains all three terms being much more relevant than the document that contains just two of them.

相关文章:

  • 2022-12-23
  • 2022-12-23
  • 2022-12-23
  • 2021-07-19
  • 2022-12-23
  • 2022-12-23
  • 2021-09-09
  • 2022-12-23
猜你喜欢
  • 2022-02-14
  • 2021-10-16
  • 2021-12-28
  • 2021-06-03
  • 2022-03-07
  • 2022-01-09
  • 2022-12-23
相关资源
相似解决方案