elasticsearch按特殊字符搜索答案

【问题标题】：elasticsearch search by special characterelasticsearch按特殊字符搜索
【发布时间】：2021-10-11 16:38:57
【问题描述】：

我有一组以下短语：[remix]、[18+] 等。如何通过一个字符（例如“[”）进行搜索以找到所有这些变体？现在我有以下分析器配置：

{
  "analysis": {
    "analyzer": {
      { "bigram_analyzer": {
        { "type": "custom",
        { "tokenizer": { "keyword",
        { "filter": [
          { "lowercase",
          "bigram_filter".
        ]
      },
      { "full_text_analyzer": {
        { "type": "custom",
        { "tokenizer": { "ngram_tokenizer",
        { "filter": [
          "lowercase"
        ]
      }
    },
    { "filter": {
      { "bigram_filter": {
        { "type": "edge_ngram",
        { "max_gram": 2
      }
    },
    { "tokenizer": {
      { "ngram_tokenizer": {
        { "type": "ngram",
        { "min_gram": 3,
        { "max_gram": 3,
        { "token_chars": [
          { "letter",
          { "digit",
          { "symbol",
          "punctuation"
        ]
      }
    }
  }
}

使用spring boot data elasticsearch starter在java实体级别进行映射

【问题讨论】：

标签： java spring elasticsearch spring-data-elasticsearch

【解决方案1】：

如果我正确理解您的问题 - 您希望实现一个自动完成分析器，该分析器将返回任何以 [ 或任何其他字符开头的术语。为此，您可以使用 ngram 自动完成创建自定义分析器。这是一个例子：

这里是测试指标：

PUT /testing-index-v3
{
  "settings": {
    "number_of_shards": 1,
    "analysis": {
        "filter": {
            "autocomplete_filter": {
                "type": "edge_ngram",
                "min_gram": 1,
                "max_gram": 15
            }
        },
        "analyzer": {
            "autocomplete": {
                "type": "custom",
                "tokenizer": "keyword",
                "filter": [
                    "lowercase",
                    "autocomplete_filter"
                ]
            }
        }
    }
  },
  "mappings": {
    "properties": {
      "term": { 
        "type": "text",
        "analyzer": "autocomplete"
        
      }
    }
  }
}

这是输入的文件：

POST /testing-index-v3/_doc
{
  "term": "[+18]"
}

POST testing-index-v3/_doc
{
  "term": "[remix]"
}

POST testing-index-v3/_doc
{
  "term": "test"
}

最后是我们的搜索：

GET testing-index-v3/_search
{
  "query": {
    "match": {
      "term": {
        "query": "[remi",
        "analyzer": "keyword", 
        "fuzziness": 0
      }
    }
  }
}

如您所见，我为自动完成过滤器选择了关键字标记器。我正在使用带有 min_gram: 1 和 max_gram 15 的 ngram 过滤器，这意味着我们的查询将被分成如下标记：

input-query = i, in, inp, inpu, input .. 等。最多可分隔 15 个令牌。这仅在索引时才需要。查看查询，我们还指定了关键字分析器 - 此分析器用于搜索时间，它与结果硬匹配。以下是一些示例搜索和结果：

GET testing-index-v3/_search
{
  "query": {
    "match": {
      "term": {
        "query": "[",
        "analyzer": "keyword", 
        "fuzziness": 0
      }
    }
  }
}

result:

    "hits" : [
          {
            "_index" : "testing-index-v3",
            "_type" : "_doc",
            "_id" : "w5c_IHsBGGZ-oIJIi-6n",
            "_score" : 0.7040055,
            "_source" : {
              "term" : "[remix]"
            }
          },
          {
            "_index" : "testing-index-v3",
            "_type" : "_doc",
            "_id" : "xJc_IHsBGGZ-oIJIju7m",
            "_score" : 0.7040055,
            "_source" : {
              "term" : "[+18]"
            }
          }
        ]

GET testing-index-v3/_search
{
  "query": {
    "match": {
      "term": {
        "query": "[+",
        "analyzer": "keyword", 
        "fuzziness": 0
      }
    }
  }
}

result:

    "hits" : [        
          {
            "_index" : "testing-index-v3",
            "_type" : "_doc",
            "_id" : "xJc_IHsBGGZ-oIJIju7m",
            "_score" : 0.7040055,
            "_source" : {
              "term" : "[+18]"
            }
          }
        ]

希望这个答案对您有所帮助。祝您在 elasticsearch 的冒险之旅中好运！

【讨论】：