【问题标题】:Elasticsearch. Terms aggregation on nested field with duplicated values弹性搜索。具有重复值的嵌套字段上的术语聚合
【发布时间】:2017-09-17 01:43:44
【问题描述】:

我对 Elasticsearch 中的嵌套聚合有一些问题。我有嵌套字段的映射:

POST my_index/ my_type / _mapping
{
    "properties": {
        "name": {
            "type": "keyword"
        },
        "nested_fields": {
            "type": "nested",
                "properties": {
                "key": {
                    "type": "keyword"
                },
                "value": {
                    "type": "keyword"
                }
            }
        }
    }
}

然后我将一个文档添加到索引中:

POST my_index/ my_type
{
    "name":"object1",
        "nested_fields":[
            {
                "key": "key1",
                "value": "value1"

            },
            {
                "key": "key1",
                "value": "value2"
            }
        ]
}

如您所见,在我的嵌套数组中,我有两个项目,它们具有相似的 key 字段,但不同的 value 字段。然后我想做这样的查询:

GET / my_index / my_type / _search
{
    "query": {
        "nested": {
            "path": "nested_fields",
                "query": {
                "bool": {
                    "must": [
                        {
                            "term": {
                                "nested_fields.key": {
                                    "value": "key1"
                                }
                            }
                        },
                        {
                            "terms": {
                                "nested_fields.value": [
                                    "value1",
                                    "value2"
                                ]
                            }
                        }
                    ]
                }
            }
        }
    },
    "aggs": {
        "agg_nested_fields": {
            "nested": {
                "path": "nested_fields"
            },
            "aggs": {
                "agg_nested_fields_key": {
                    "terms": {
                        "field": "nested_fields.key",
                            "size": 10
                    }
                }
            }
        }
    }
}

如您所见,我想查找在nested_field 数组中至少有一个对象、key 属性等于key1 和提供的值之一(value1value2)的所有文档.然后我想通过nested_fields.key 对创建的文档进行分组。但我有这样的回应

{
    "took": 13,
        "timed_out": false,
            "_shards": {
        "total": 5,
            "successful": 5,
                "failed": 0
    },
    "hits": {
        "total": 1,
            "max_score": 0.87546873,
                "hits": [
                    {
                        "_index": "my_index",
                        "_type": "my_type",
                        "_id": "AVuLNXxiryKmA7VEwOfV",
                        "_score": 0.87546873,
                        "_source": {
                            "name": "object1",
                            "nested_fields": [
                                {
                                    "key": "key1",
                                    "value": "value1"
                                },
                                {
                                    "key": "key1",
                                    "value": "value2"
                                }
                            ]
                        }
                    }
                ]
    },
    "aggregations": {
        "agg_nested_fields": {
            "doc_count": 2,
                "agg_nested_fields_key": {
                "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                        "buckets": [
                            {
                                "key": "key1",
                                "doc_count": 2
                            }
                        ]
            }
        }
    }
}

正如您从响应中看到的那样,我有一个命中(它是正确的),但是该文档在聚合中被计算了两次(参见doc_count: 2),因为它在nested_fields 中有两个具有“key1”值的项目大批。如何在聚合中获得正确的计数?

【问题讨论】:

  • 这是正确的计数,因为每个嵌套元素本身就是一个文档。因此,您确实有两个嵌套文档,它们的键为 key1,值为 value1value2
  • 是的,我需要这个。我该如何克服这个问题?

标签: elasticsearch aggregation elasticsearch-5


【解决方案1】:

您必须在嵌套聚合中使用 reverse_nested aggs 才能返回根文档的聚合计数。

{
    "query": {
        "nested": {
            "path": "nested_fields",
            "query": {
                "bool": {
                    "must": [{
                            "term": {
                                "nested_fields.key": {
                                    "value": "key1"
                                }
                            }
                        },
                        {
                            "terms": {
                                "nested_fields.value": [
                                    "value1",
                                    "value2"
                                ]
                            }
                        }
                    ]
                }
            }
        }
    },
    "aggs": {
        "agg_nested_fields": {
            "nested": {
                "path": "nested_fields"
            },
            "aggs": {
                "agg_nested_fields_key": {
                    "terms": {
                        "field": "nested_fields.key",
                        "size": 10
                    },
                    "aggs": {
                        "back_to_root": {
                            "reverse_nested": {
                                "path": "_source"
                            }
                        }
                    }
                }
            }
        }
    }
}

【讨论】:

  • 怎么样,你想要父/根文档的计数。好的,我明白到这里'正如您从响应中看到的那样,我有一个命中(它是正确的),但该文档在聚合中被计数了两次(参见 doc_count:2),因为它有两个项目,其中包含 'key1' 值嵌套字段数组。我怎样才能在聚合中获得正确的计数?你已经添加了一些你想要实现的更多信息
猜你喜欢
  • 2018-08-06
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2018-08-22
  • 1970-01-01
  • 1970-01-01
  • 2015-08-15
  • 1970-01-01
相关资源
最近更新 更多