【问题标题】:Change the search order based on word index根据单词索引更改搜索顺序
【发布时间】:2020-05-03 19:37:40
【问题描述】:

有什么方法可以增加文档开头的条款的权重?例如我有 3 个文件。

Medicine XXX
Sulpher This medicine contains sulpher and should be taken only after consultation with your doctor.

Medicine YYY
contains: sulpher Not recommended by most physicians

Medicine ZZZ
This medicine works like sulpher but does not contain sulpher at all.

文档 XXX 应列在搜索词“Sulpher”的顶部,因为这是该文档中的第一个词。如果 YYY 列在顶部就可以了,因为它与 XXX 相同。但ZZZ应该永远是最后一个。换句话说,位于“左侧”的术语应该比位于文档“右侧”的术语具有更高的优先级。

【问题讨论】:

    标签: elasticsearch lucene


    【解决方案1】:

    您可以通过小写标准化术语位置来提升:

    PUT sulphur
    {
      "settings": {
        "analysis": {
          "normalizer": {
            "keyword_lowercase": {
              "type": "custom",
              "filter": ["lowercase"]
            }
          }
        }
      },
      "mappings": {
        "properties": {
          "text": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "normalizer": "keyword_lowercase"
              }
            }
          }
        }
      }
    }
    
    POST sulphur/_doc
    {"text":"This medicine works like sulpher but does not contain sulpher at all."}
    POST sulphur/_doc
    {"text":"contains: sulpher Not recommended by most physicians"}
    POST sulphur/_doc
    {"text":"Sulpher This medicine contains sulpher and should be taken only after consultation with your doctor."}
    

    然后

    GET sulphur/_search
    {
      "query": {
        "bool": {
          "must": [
            {
              "function_score": {
                "query": {
                  "match": {
                    "text": "sulpher"
                  }
                },
                "script_score": {
                  "script": """
                    def pos = doc['text.keyword'].value.indexOf('sulpher');
                    return Math.exp((2.0/(pos+1)))
                  """
                },
                "boost_mode": "replace"
              }
            }
          ]
        }
      }
    }
    

    屈服

    [
      {
        "_index":"sulphur",
        "_type":"_doc",
        "_id":"sf5S2nEBW-D5QnrWODvB",
        "_score":7.389056,
        "_source":{
          "text":"Sulpher This medicine contains sulpher and should be taken only after consultation with your doctor."
        }
      },
      {
        "_index":"sulphur",
        "_type":"_doc",
        "_id":"sP5S2nEBW-D5QnrWNjtw",
        "_score":1.1993961,
        "_source":{
          "text":"contains: sulpher Not recommended by most physicians"
        }
      },
      {
        "_index":"sulphur",
        "_type":"_doc",
        "_id":"r_5S2nEBW-D5QnrWNDuw",
        "_score":1.079959,
        "_source":{
          "text":"This medicine works like sulpher but does not contain sulpher at all."
        }
      }
    ]
    

    【讨论】:

    • 如果我有大量文档,脚本会缩放吗?
    • 我想是的。 indexOf 操作并不复杂。
    猜你喜欢
    • 1970-01-01
    • 2020-04-12
    • 2019-01-23
    • 1970-01-01
    • 2017-08-07
    • 2021-03-14
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多