【问题标题】:exclude _id and _index field in elasticsearch result data排除弹性搜索结果数据中的 _id 和 _index 字段
【发布时间】:2014-07-21 00:23:48
【问题描述】:

如果只是点击 api,每个文档中有 5 个字段。但我只想要这两个字段(user_id 和 loc_code),所以我在字段列表中提到。但它仍然返回一些不必要的数据,如_shards、hits、time_out 等。

使用以下查询在 chrome 中的邮递员插件中发出 POST 请求

<:9200>/myindex/mytype/_search
{
    "fields" : ["user_id", "loc_code"],
    "query":{"term":{"group_id":"1sd323s"}}
}   

// 输出

 {
        "took": 17,
        "timed_out": false,
        "_shards": {
            "total": 5,
            "successful": 5,
            "failed": 0
        },
        "hits": {
            "total": 323,
            "max_score": 8.402096,
            "hits": [
                {
                    "_index": "myindex",
                    "_type": "mytype",
                    "_id": "<someid>",
                    "_score": 8.402096,
                    "fields": {
                        "user_id": [
                            "<someuserid>"
                        ],
                        "loc_code": [
                            768
                        ]
                    }
                },
               ...
            ]
        }
    }

但我只想要文档字段(两个提到的字段),我也不想要 _id、_index、_type。有什么办法吗

【问题讨论】:

    标签: elasticsearch full-text-search


    【解决方案1】:

    使用filter_path 是一种可能不完整但很有帮助的解决方案。例如,假设我们在索引中有以下内容:

    PUT foods/_doc/_bulk
    { "index" : { "_id" : "1" } }
    { "name" : "chocolate cake", "calories": "too much" }
    { "index" : { "_id" : "2" } }
    { "name" : "lemon pie", "calories": "a lot!"  }
    { "index" : { "_id" : "3" } }
    { "name" : "pizza", "calories": "oh boy..."  }
    

    这样的搜索...

    GET foods/_search
    {
      "query": {
        "match_all": {}
      }
    }
    

    ...会产生很多元数据:

    {
      "took" : 1,
      "timed_out" : false,
      "_shards" : {
        "total" : 5,
        "successful" : 5,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : 3,
        "max_score" : 1.0,
        "hits" : [
          {
            "_index" : "foods",
            "_type" : "_doc",
            "_id" : "2",
            "_score" : 1.0,
            "_source" : {
              "name" : "lemon pie",
              "calories" : "a lot!"
            }
          },
          {
            "_index" : "foods",
            "_type" : "_doc",
            "_id" : "1",
            "_score" : 1.0,
            "_source" : {
              "name" : "chocolate cake",
              "calories" : "too much"
            }
          },
          {
            "_index" : "foods",
            "_type" : "_doc",
            "_id" : "3",
            "_score" : 1.0,
            "_source" : {
              "name" : "pizza",
              "calories" : "oh boy..."
            }
          }
        ]
      }
    }
    

    但是如果我们给搜索 URL 提供参数filter_path=hits.hits._score...

    GET foods/_search?filter_path=hits.hits._source
    {
      "query": {
        "match_all": {}
      }
    }
    

    ...它只会返回源代码(尽管仍然嵌套很深):

    {
      "hits" : {
        "hits" : [
          {
            "_source" : {
              "name" : "lemon pie",
              "calories" : "a lot!"
            }
          },
          {
            "_source" : {
              "name" : "chocolate cake",
              "calories" : "too much"
            }
          },
          {
            "_source" : {
              "name" : "pizza",
              "calories" : "oh boy..."
            }
          }
        ]
      }
    }
    

    您甚至可以过滤字段:

    GET foods/_search?filter_path=hits.hits._source.name
    {
      "query": {
        "match_all": {}
      }
    }
    

    ...你会得到这个:

    {
      "hits" : {
        "hits" : [
          {
            "_source" : {
              "name" : "lemon pie"
            }
          },
          {
            "_source" : {
              "name" : "chocolate cake"
            }
          },
          {
            "_source" : {
              "name" : "pizza"
            }
          }
        ]
      }
    }
    

    如果您愿意,您还可以做更多事情:只需查看documentation

    【讨论】:

      【解决方案2】:

      您可以改用 GET api。试试这样的:

      /myindex/mytype/<objectId>/_source
      

      在您的结果中,您只会得到 _source。

      见:http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-get.html

      好吧,这假设您知道文档的 ID。我不确定您是否可以在使用搜索 api 时排除元数据。

      也许: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-source-filtering.html

      【讨论】:

      • 这并不能真正回答问题。我无法想象为什么 ES 团队不允许完全控制输出。确实应该可以排除 _index、_type、_id、_score 和 _sort 等输出字段
      猜你喜欢
      • 2014-10-10
      • 1970-01-01
      • 1970-01-01
      • 2016-01-06
      • 2014-06-17
      • 1970-01-01
      • 2023-03-09
      • 2015-06-16
      • 2018-03-08
      相关资源
      最近更新 更多