排除弹性搜索结果数据中的 _id 和 _index 字段答案

【问题标题】：exclude _id and _index field in elasticsearch result data排除弹性搜索结果数据中的 _id 和 _index 字段
【发布时间】：2014-07-21 00:23:48
【问题描述】：

如果只是点击 api，每个文档中有 5 个字段。但我只想要这两个字段（user_id 和 loc_code），所以我在字段列表中提到。但它仍然返回一些不必要的数据，如_shards、hits、time_out 等。

使用以下查询在 chrome 中的邮递员插件中发出 POST 请求

<:9200>/myindex/mytype/_search
{
    "fields" : ["user_id", "loc_code"],
    "query":{"term":{"group_id":"1sd323s"}}
}

// 输出

 {
        "took": 17,
        "timed_out": false,
        "_shards": {
            "total": 5,
            "successful": 5,
            "failed": 0
        },
        "hits": {
            "total": 323,
            "max_score": 8.402096,
            "hits": [
                {
                    "_index": "myindex",
                    "_type": "mytype",
                    "_id": "<someid>",
                    "_score": 8.402096,
                    "fields": {
                        "user_id": [
                            "<someuserid>"
                        ],
                        "loc_code": [
                            768
                        ]
                    }
                },
               ...
            ]
        }
    }

但我只想要文档字段（两个提到的字段），我也不想要 _id、_index、_type。有什么办法吗

【问题讨论】：

标签： elasticsearch full-text-search

【解决方案1】：

使用filter_path 是一种可能不完整但很有帮助的解决方案。例如，假设我们在索引中有以下内容：

PUT foods/_doc/_bulk
{ "index" : { "_id" : "1" } }
{ "name" : "chocolate cake", "calories": "too much" }
{ "index" : { "_id" : "2" } }
{ "name" : "lemon pie", "calories": "a lot!"  }
{ "index" : { "_id" : "3" } }
{ "name" : "pizza", "calories": "oh boy..."  }

这样的搜索...

GET foods/_search
{
  "query": {
    "match_all": {}
  }
}

...会产生很多元数据：

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "foods",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "name" : "lemon pie",
          "calories" : "a lot!"
        }
      },
      {
        "_index" : "foods",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "name" : "chocolate cake",
          "calories" : "too much"
        }
      },
      {
        "_index" : "foods",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "name" : "pizza",
          "calories" : "oh boy..."
        }
      }
    ]
  }
}

但是如果我们给搜索 URL 提供参数filter_path=hits.hits._score...

GET foods/_search?filter_path=hits.hits._source
{
  "query": {
    "match_all": {}
  }
}

...它只会返回源代码（尽管仍然嵌套很深）：

{
  "hits" : {
    "hits" : [
      {
        "_source" : {
          "name" : "lemon pie",
          "calories" : "a lot!"
        }
      },
      {
        "_source" : {
          "name" : "chocolate cake",
          "calories" : "too much"
        }
      },
      {
        "_source" : {
          "name" : "pizza",
          "calories" : "oh boy..."
        }
      }
    ]
  }
}

您甚至可以过滤字段：

GET foods/_search?filter_path=hits.hits._source.name
{
  "query": {
    "match_all": {}
  }
}

...你会得到这个：

{
  "hits" : {
    "hits" : [
      {
        "_source" : {
          "name" : "lemon pie"
        }
      },
      {
        "_source" : {
          "name" : "chocolate cake"
        }
      },
      {
        "_source" : {
          "name" : "pizza"
        }
      }
    ]
  }
}

如果您愿意，您还可以做更多事情：只需查看documentation。

【讨论】：

【解决方案2】：

您可以改用 GET api。试试这样的：

/myindex/mytype/<objectId>/_source

在您的结果中，您只会得到 _source。

见：http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-get.html

好吧，这假设您知道文档的 ID。我不确定您是否可以在使用搜索 api 时排除元数据。

也许： http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-source-filtering.html

【讨论】：

这并不能真正回答问题。我无法想象为什么 ES 团队不允许完全控制输出。确实应该可以排除 _index、_type、_id、_score 和 _sort 等输出字段