Elasticsearch按分页顺序区分记录答案

【问题标题】：Elasticsearch distinct records in order with paginationElasticsearch按分页顺序区分记录
【发布时间】：2019-03-26 21:43:36
【问题描述】：

如何在按分页顺序对术语字段进行聚合后获取记录。到目前为止，我有这个：

{
  "query": {
    "bool": {
      "filter": [
        {
          "terms": {
            "user_id.keyword": [
              "user@domain.com"
            ]
          }
        },
        {
          "range": {
            "creation_time": {
              "gte": "2019-02-04T19:00:00.000Z",
              "lte": "2019-05-04T19:00:00.000Z"
            }
          }
        }
      ],
      "should": [
        {
          "wildcard": {
            "operation": "*sol*"
          }
        },
        {
          "wildcard": {
            "object_id": "*sol*"
          }
        },
        {
          "wildcard": {
            "user_id": "*sol*"
          }
        },
        {
          "wildcard": {
            "user_type": "*sol*"
          }
        },
        {
          "wildcard": {
            "client_ip": "*sol*"
          }
        },
        {
          "wildcard": {
            "country": "*sol*"
          }
        },
        {
          "wildcard": {
            "workload": "*sol*"
          }
        }
      ]
    }
  },
  "aggs": {
    "user_ids": {
      "terms": {
        "field": "country.keyword",
        "include": ".*United.*"
      }
    }
  },
  "from": 0,
  "size": 10,
  "sort": [
    {
      "creation_time": {
        "order": "desc"
      }
    }
  ]
}

我查看了this，有人说它可以通过使用复合聚合或使用分区来实现。但我不确定我如何才能真正做到这一点。

我也研究了 bucket_sort，但我似乎无法让它工作：

"my_bucket_sort": {
      "bucket_sort": {
        "sort": [
          {
            "user_ids": {
              "order": "desc"
            }
          }
        ],
        "size": 3
      }
    }

我是这方面的菜鸟。请帮帮我。谢谢。

【问题讨论】：

标签： elasticsearch

【解决方案1】：

由于该字段是国家/地区，并且可能没有很高的基数，您可以将 size 设置为足够高的数字以在单个请求中返回所有国家/地区

  "aggs": {
    "user_ids": {
      "terms": {
        "field": "country.keyword",
        "include": ".*United.*",
        "size": 10000
      }
    }
  }

或者，对于高基数字段，您可以先过滤聚合，然后使用分区对值进行分页

{
  "size": 0,
  "aggs": {
    "user_ids": {
      "filter": {
        "wildcard" : { "country" : ".*United.*" }
      },
      "aggs": {
        "countries": {
          "terms": {
            "field": "country.keyword",
            "include": {
              "partition": 0,
              "num_partitions": 20
            },
            "size": 10000
          }
        }
      }
    }
  }
}

您将在其中增加 partition 的值，您发送的每个查询最多 19 个

更多详情请参阅elastic documentation

【讨论】：

感谢您的回复。有没有办法固定分区中的桶数？假设我想分页 10 个桶，即桶中的键值，有没有办法做到这一点？
size: 10000 参数决定了这一点。您需要确保 size * num_partitions 小于该字段的唯一值总数。链接的弹性文档建议先运行cardinality 查询（然后可用于确定您需要的分区数）
是的，我也在研究文档和基数聚合。您能告诉我如何在上述查询中添加基数聚合吗？
不会，需要单独查询。然后，您将解析响应以获取唯一值的数量，然后使用该数字为 num_partitions 选择适当的数字（假设 size 将是恒定的）