Elasticsearch - 精确匹配和部分匹配的索引映射设置答案

【问题标题】：Elasticsearch - Index Mapping settings for both exact and partial matchingElasticsearch - 精确匹配和部分匹配的索引映射设置
【发布时间】：2016-01-12 04:32:24
【问题描述】：

我是 elasticsearch 新手，正在尝试学习如何使用最佳映射设置进行索引以实现以下目标。

如果我有这样的文件

{"name":"Galapagos Islands"}

我想得到以下两个查询的结果

1) 部分匹配

{
    "query": {
        "match": {
            "name": "ga"
        }
    }
}

2) 精确匹配

{
    "query": {
        "term": {
            "name": "Galapagos Islands"
        }
    }
}

使用我目前的设置。我能够实现部分匹配部分。但是完全匹配不会返回任何结果。请在下面找到我索引的设置。

{
  "mappings": {
        "islands": {
            "properties": {
                "name":{
                    "type": "string",
                    "index_analyzer": "autocomplete",
                    "search_analyzer": "search_ngram"
                }
            }
        }
    },

  "settings":{
    "analysis":{
      "analyzer":{
        "autocomplete":{
          "type":"custom",
          "tokenizer":"standard",
          "filter":[ "standard", "lowercase", "stop", "kstem", "ngram" ] 
        },
        "search_ngram": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": "lowercase"
        }
      },
      "filter":{
        "ngram":{
          "type":"ngram",
          "min_gram":2,
          "max_gram":15
        }
      }
    }
  }
}

在字段上进行精确匹配和部分匹配的正确方法是什么？

更新

使用下面给出的设置重新创建索引后。我的映射如下所示

curl -XGET 'localhost:9200/testing/_mappings?pretty'
{
  "testing" : {
    "mappings" : {
      "islands" : {
        "properties" : {
          "name" : {
            "type" : "string",
            "index_analyzer" : "autocomplete",
            "search_analyzer" : "search_ngram",
            "fields" : {
              "raw" : {
                "type" : "string",
                "analyzer" : "my_keyword_lowercase_analyzer"
              }
            }
          }
        }
      }
    }
  }
}

我的索引设置如下

{
  "mappings": {
        "islands": {
            "properties": {
                "name":{
                    "type": "string",
                    "index_analyzer": "autocomplete",
                    "search_analyzer": "search_ngram",
                    "fields": {
                      "raw": {
                          "type": "string",
                          "analyzer": "my_keyword_lowercase_analyzer"
                      }
                    }
                }
            }
        }
    },

  "settings":{
    "analysis":{
      "analyzer":{
        "autocomplete":{
          "type":"custom",
          "tokenizer":"standard",
          "filter":[ "standard", "lowercase", "stop", "kstem", "ngram" ] 
        },
        "search_ngram": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": "lowercase"
        },
        "my_keyword_lowercase_analyzer": {
          "type": "custom",
          "filter": ["lowercase"],
          "tokenizer": "keyword"
        }
      },
      "filter":{
        "ngram":{
          "type":"ngram",
          "min_gram":2,
          "max_gram":15
        }
      }
    }
  }
}

以上所有内容，当我这样查询时

curl -XGET 'localhost:9200/testing/islands/_search?pretty' -d '{"query": {"term": {"name.raw" : "Galapagos Islands"}}}'
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

我的文件是这样的

curl -XGET 'localhost:9200/testing/islands/1?pretty'
{
  "_index" : "testing",
  "_type" : "islands",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source":{"name":"Galapagos Islands"}
}

【问题讨论】：

您的映射、测试数据和查询都对我有用。你确定你测试正确吗？
运行GET /index_name/islands/_search { "fielddata_fields": [ "name" ] }会得到什么？
在您更新票证后，这很有意义:-)。我的回答如下。
@AndreiStefan 请在上面找到 fielddatafields 查询输出
我对@987654330@ 的请求在您最初的帖子中是有意义的。现在您使用正确的查询更新了它，我不再需要它了。在下面试试我的建议。

标签： indexing tokenize elasticsearch kibana-4

【解决方案1】：

向您的name 属性添加一个子字段，该属性应为not_analyzed。或者，如果您关心小写/大写，可以使用 keyword 标记器和 lowercase 过滤器。

这应该按原样索引Galapagos，而不是修改。然后您就可以进行term 搜索了。

例如，keyword 分析器和lowercase 过滤器：

    "my_keyword_lowercase_analyzer": {
      "type": "custom",
      "filter": [
        "lowercase"
      ],
      "tokenizer": "keyword"
    }

还有映射：

        "properties": {
            "name":{
                "type": "string",
                "index_analyzer": "autocomplete",
                "search_analyzer": "search_ngram",
                "fields": {
                    "raw": {
                        "type": "string",
                        "analyzer": "my_keyword_lowercase_analyzer"
                    }
                }
            }
        }

要使用的查询是：

{
    "query": {
        "term": {
            "name.raw": "galapagos islands"
        }
    }
}

因此，您应该使用 name.raw（子字段），而不是使用相同的字段 - name。

【讨论】：

嗨@Andrei，感谢您快速回复我的询问。我使用上述设置重新创建了我的索引。但我无法搜索“加拉帕戈斯群岛”一词。我使用的查询是 {"query": {"term": {"name" : "Galapagos Islands"}}}
嗨 Andrei，我仔细检查了所有内容并添加到我的问题中。它似乎对我不起作用:-(
我的错。我的分析器会将这些值小写。您有两个选择：在搜索中使用"galapagos islands"（小写）："term": { "name.raw": "galapagos islands" } - 或将映射更改为如下内容："fields": { "raw": { "type": "string", "index": "not_analyzed" } } 并在搜索时使用"Galapagos Islands"："term": { "name.raw": "Galapagos Islands" }