多字段术语聚合方法答案

【问题标题】：Multi-field terms aggregation approach多字段术语聚合方法
【发布时间】：2016-10-26 13:24:51
【问题描述】：

我有一个包含如下文档的索引：

[
    {
        "name": "Marco",
        "city_id": 45,
        "city": "Rome"
    },
    {
        "name": "John",
        "city_id": 46,
        "city": "London"
    },
    {
        "name": "Ann",
        "city_id": 47,
        "city": "New York"
    },
    ...
]

还有一个聚合：

"aggs": {
    "city": {
        "terms": {
            "field": "city"
        }
    }
}

这给了我这样的回应：

{
    "aggregations": {    
        "city": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 694,
            "buckets": [
                {
                    "key": "Rome",
                    "doc_count": 15126
                },
                {
                    "key": "London",
                    "doc_count": 11395
                },
                {
                    "key": "New York",
                    "doc_count": 14836
                },
                ...
          ]
        },
        ...
    }
}

我的问题是我的聚合结果也需要city_id。我一直在阅读here，我不能进行多字段术语聚合，但我不需要按两个字段聚合，而只需返回另一个字段，该字段对于每个术语字段（基本上是一个城市/ city_id 对）。在不损失性能的情况下实现这一目标的最佳方法是什么？

我可以创建一个名为city_with_id 的字段，其值如"Rome;45"、"London;46" 等，并通过该字段进行聚合。对我来说它会起作用，因为我可以简单地在我的后端拆分结果并获得我需要的 ID，但这是最好的方法吗？

【问题讨论】：

标签： elasticsearch

【解决方案1】：

一种方法是使用top_hits 并使用源过滤仅返回city_id，如下例所示。我认为这不会降低性能在尝试 OP 中指定的 city_name_id 字段的方法之前，您可以在索引上尝试一下以查看影响。

例子：

    post <index>/_search
    {
        "size" : 0,
        "aggs": {
            "city": {
                "terms": {
                    "field": "city"
                },
                "aggs" : {
                    "id" : {
                        "top_hits" : {
                            "_source": {
                                "include": [
                                    "city_id"
                                ]
                            },
                            "size" : 1
                        }
                    }
                }
            }
        }
    }

结果：

 {
               "key": "London",
               "doc_count": 2,
               "id": {
                  "hits": {
                     "total": 2,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "country",
                           "_type": "city",
                           "_id": "2",
                           "_score": 1,
                           "_source": {
                              "city_id": 46
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": "New York",
               "doc_count": 1,
               "id": {
                  "hits": {
                     "total": 1,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "country",
                           "_type": "city",
                           "_id": "3",
                           "_score": 1,
                           "_source": {
                              "city_id": 47
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": "Rome",
               "doc_count": 1,
               "id": {
                  "hits": {
                     "total": 1,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "country",
                           "_type": "city",
                           "_id": "1",
                           "_score": 1,
                           "_source": {
                              "city_id": 45
                           }
                        }
                     ]
                  }
               }
            }

【讨论】：

成功了！实际上，使用这种方法我已经浪费了相当多的时间，因为我的示例只是说明性的——在实际场景中，我有很多字段需要应用嵌套聚合，结果是不可接受的。无论如何，它有效，我会接受你的回答。非常感谢！