【问题标题】:Exclude certain tokens from Elasticsearch's lowercase filter从 Elasticsearch 的小写过滤器中排除某些标记
【发布时间】:2020-05-20 23:16:27
【问题描述】:

我希望所有单词都被索引为小写标记,除了少数几个。我想我可以结合使用condition 令牌过滤器和lowercase 过滤器来完成此操作:

基于我在文档中阅读此页面: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-condition-tokenfilter.html

我添加了这个过滤器,以排除单词“WHO”:

{
   "filter":{
      "smart_lowercase_filter":{
         "filter":[
            "lowercase"
         ],
         "type":"condition",
         "script":{
            "source":"token.term != 'WHO'"
         }
      }
   }
}

但是,“WHO”仍然被标记为“who”。知道我做错了什么吗?

非常感谢。

【问题讨论】:

    标签: elasticsearch


    【解决方案1】:

    您需要使用CharSequence.toString() 方法,否则您将CharSequenceString 进行比较,这是行不通的。

    {
      "settings": {
        "analysis": {
          "filter": {
            "smart_lowercase_filter": {
              "filter": [
                "lowercase"
              ],
              "type": "condition",
              "script": {
                "source": "token.term.toString() != 'WHO'"
                                         ^
                                         |
                                      add this
              }
            }
          },
          "analyzer": {
            "my_analyzer": {
              "type": "custom",
              "tokenizer": "whitespace",
              "filter": [
                "smart_lowercase_filter"
              ]
            }
          }
        }
      }
    }
    

    你会得到这个:

    {
      "tokens" : [
        {
          "token" : "hey",
          "start_offset" : 0,
          "end_offset" : 3,
          "type" : "word",
          "position" : 0
        },
        {
          "token" : "WHO",                  <------------
          "start_offset" : 4,
          "end_offset" : 7,
          "type" : "word",
          "position" : 1
        },
        {
          "token" : "are",
          "start_offset" : 8,
          "end_offset" : 11,
          "type" : "word",
          "position" : 2
        },
        {
          "token" : "you",
          "start_offset" : 12,
          "end_offset" : 15,
          "type" : "word",
          "position" : 3
        }
      ]
    }
    

    【讨论】:

      猜你喜欢
      • 2016-03-04
      • 1970-01-01
      • 1970-01-01
      • 2014-09-08
      • 1970-01-01
      • 2023-03-20
      • 2019-07-04
      • 2023-03-18
      • 1970-01-01
      相关资源
      最近更新 更多