【问题标题】:elasticsearch search by special characterelasticsearch按特殊字符搜索
【发布时间】:2021-10-11 16:38:57
【问题描述】:

我有一组以下短语:[remix]、[18+] 等。如何通过一个字符(例如“[”)进行搜索以找到所有这些变体? 现在我有以下分析器配置:

{
  "analysis": {
    "analyzer": {
      { "bigram_analyzer": {
        { "type": "custom",
        { "tokenizer": { "keyword",
        { "filter": [
          { "lowercase",
          "bigram_filter".
        ]
      },
      { "full_text_analyzer": {
        { "type": "custom",
        { "tokenizer": { "ngram_tokenizer",
        { "filter": [
          "lowercase"
        ]
      }
    },
    { "filter": {
      { "bigram_filter": {
        { "type": "edge_ngram",
        { "max_gram": 2
      }
    },
    { "tokenizer": {
      { "ngram_tokenizer": {
        { "type": "ngram",
        { "min_gram": 3,
        { "max_gram": 3,
        { "token_chars": [
          { "letter",
          { "digit",
          { "symbol",
          "punctuation"
        ]
      }
    }
  }
}

使用spring boot data elasticsearch starter在java实体级别进行映射

【问题讨论】:

    标签: java spring elasticsearch spring-data-elasticsearch


    【解决方案1】:

    如果我正确理解您的问题 - 您希望实现一个自动完成分析器,该分析器将返回任何以 [ 或任何其他字符开头的术语。为此,您可以使用 ngram 自动完成创建自定义分析器。这是一个例子:

    这里是测试指标:

    PUT /testing-index-v3
    {
      "settings": {
        "number_of_shards": 1,
        "analysis": {
            "filter": {
                "autocomplete_filter": {
                    "type": "edge_ngram",
                    "min_gram": 1,
                    "max_gram": 15
                }
            },
            "analyzer": {
                "autocomplete": {
                    "type": "custom",
                    "tokenizer": "keyword",
                    "filter": [
                        "lowercase",
                        "autocomplete_filter"
                    ]
                }
            }
        }
      },
      "mappings": {
        "properties": {
          "term": { 
            "type": "text",
            "analyzer": "autocomplete"
            
          }
        }
      }
    }
    
    

    这是输入的文件:

    POST /testing-index-v3/_doc
    {
      "term": "[+18]"
    }
    
    POST testing-index-v3/_doc
    {
      "term": "[remix]"
    }
    
    POST testing-index-v3/_doc
    {
      "term": "test"
    }
    

    最后是我们的搜索:

    GET testing-index-v3/_search
    {
      "query": {
        "match": {
          "term": {
            "query": "[remi",
            "analyzer": "keyword", 
            "fuzziness": 0
          }
        }
      }
    }
    

    如您所见,我为自动完成过滤器选择了关键字标记器。我正在使用带有 min_gram: 1 和 max_gram 15 的 ngram 过滤器,这意味着我们的查询将被分成如下标记:

    input-query = i, in, inp, inpu, input .. 等。最多可分隔 15 个令牌。这仅在索引时才需要。查看查询,我们还指定了关键字分析器 - 此分析器用于搜索时间,它与结果硬匹配。以下是一些示例搜索和结果:

    GET testing-index-v3/_search
    {
      "query": {
        "match": {
          "term": {
            "query": "[",
            "analyzer": "keyword", 
            "fuzziness": 0
          }
        }
      }
    }
    
    result:
    
        "hits" : [
              {
                "_index" : "testing-index-v3",
                "_type" : "_doc",
                "_id" : "w5c_IHsBGGZ-oIJIi-6n",
                "_score" : 0.7040055,
                "_source" : {
                  "term" : "[remix]"
                }
              },
              {
                "_index" : "testing-index-v3",
                "_type" : "_doc",
                "_id" : "xJc_IHsBGGZ-oIJIju7m",
                "_score" : 0.7040055,
                "_source" : {
                  "term" : "[+18]"
                }
              }
            ]
    
    GET testing-index-v3/_search
    {
      "query": {
        "match": {
          "term": {
            "query": "[+",
            "analyzer": "keyword", 
            "fuzziness": 0
          }
        }
      }
    }
    
    result:
    
        "hits" : [        
              {
                "_index" : "testing-index-v3",
                "_type" : "_doc",
                "_id" : "xJc_IHsBGGZ-oIJIju7m",
                "_score" : 0.7040055,
                "_source" : {
                  "term" : "[+18]"
                }
              }
            ]
    

    希望这个答案对您有所帮助。祝您在 elasticsearch 的冒险之旅中好运!

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2012-02-02
      • 1970-01-01
      • 1970-01-01
      • 2015-07-24
      • 1970-01-01
      • 2014-04-25
      相关资源
      最近更新 更多