【发布时间】:2020-06-14 17:32:21
【问题描述】:
我有大量包含关键字值数组的文档(数百万):
映射:
{
"my_index": {
"mappings": {
"properties": {
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"keywords": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
示例文件:
{
"id": "abc",
"keywords": ["cat", "dog", "person"]
}
{
"id": "def",
"keywords": ["tree", "person"]
}
{
"id": "ghi",
"keywords": ["person", "human"]
}
...
假设我获得了前 3 个关键字桶,其余的则显示在“其他”中,如下所示:
/GET /my_index/_search
{
"size": 0,
"track_total_hits": true,
"aggs": {
"keyword_buckets": {
"terms": {
"field": "keywords.keyword",
"size": 3
}
}
}
}
有 2,232,121 个文档,但我得到的存储桶是这样的:
{
"took": 256,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2232121,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"keyword_buckets": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 6250132,
"buckets": [
{
"key": "person",
"doc_count": 326552
},
{
"key": "human",
"doc_count": 326529
},
{
"key": "photograph",
"doc_count": 222190
}
]
}
}
}
我在“其他”存储桶中获得了 6,250,132 个文档。我的期望是前 3 名和“其他”的总和为 2,232,121。在 SQL 术语中,它将获得所有存储桶的 DISTINCT 文档计数。
我需要做什么查询才能实现这一目标?
【问题讨论】:
标签: elasticsearch unique aggregation