【问题标题】:Group by based on last entry of filter in Elastic Search根据 Elastic Search 中最后一个过滤器条目分组
【发布时间】:2020-12-21 10:25:46
【问题描述】:

我有一个类似下面的场景:

包含商店购买商品的索引,其中每个商品都有一个 order_id。 而且我需要按每个订单的最后一个项的颜色进行分组。

数据结构:

{
    "order_id": 1,
    "product_id":235233
    "color": "Blue",
    "purchase_date": "2020-08-21T05:53:43.362Z"
},
{
    "order_id": 1,
    "product_id":2352662
    "color": "Black",
    "purchase_date": "2020-08-23T05:53:43.362Z"
},
{
    "order_id": 2,
    "product_id":855477
    "color": "Blue",
    "purchase_date": "2020-08-22T05:53:43.362Z"
},
{
    "order_id": 2,
    "product_id":322352
    "color": "Red",
    "purchase_date": "2020-08-24T05:53:43.362Z"
},
{
    "order_id": 3,
    "product_id":3225235
    "color": "Red",
    "purchase_date": "2020-08-25T05:53:43.362Z"
}

预期结果

黑色:1(order_id 1 的最后一个产品的颜色)

Red:2(order_id 2, 3 最后产品的颜色)

基于this answer,我可以将每个订单的最后一件商品作为整个商品,但我正在寻找的是直接获取每种颜色的商品数量

POST /items/_search?search_type=count
{
    "aggs": {
        "group": {
            "terms": {
                "field": "order_id"
            },
            "aggs": {
                "group_items": {
                    "top_hits": {
                        "size": 1,
                          "sort": [
                            {
                                "purchase_date": {
                                    "order": "desc"
                                }
                            }
                        ]
                    }
                }
            }
        }
    }
}

以下内容为我提供了所有订单商品的每种颜色的商品数量,而不仅仅是每个订单的最后一个。

GET /items/_search?search_type=count
{
 "size":0,
  "aggs": {
    "colors": {
       "terms": {
        "field": "color.keyword"
        }
     }
  }
}

【问题讨论】:

  • @ESCoder 谢谢,希望现在清楚

标签: elasticsearch elasticsearch-aggregation elasticsearch-query


【解决方案1】:

解决此问题的另一种方法是创建并维护一个单独的索引 (latest_by_order),以跟踪每个订单的最新文档。 这可以使用转换 (see docs) 来实现。

可以使用以下命令创建这样的转换:

PUT _transform/latest_by_order
{
  "source": {
    "index": "items"
  },
  "dest": {
    "index": "latest_by_order"
  },
  "latest": {
    "unique_key": ["order_id"],
    "sort": "purchase_date"
  }
}

然后,可以在新的(转换的)索引之上进行二次分析。 以下请求:

GET latest_by_order/_search
{
  "size": 0,
  "aggs": {
    "count_by_color": {
      "terms": {
        "field": "color.keyword"
      }
    }
  }
}

将产生以下响应:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "count_by_color" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Red",
          "doc_count" : 2
        },
        {
          "key" : "Black",
          "doc_count" : 1
        }
      ]
    }
  }
}

【讨论】:

    【解决方案2】:

    您可以使用color 分组并按purchase_date 的最大值排序,如下所示:

    {
      "size": 0,
      "aggs": {
        "group": {
          "terms": {
            "field": "color.keyword",
            "order": {
              "by_latest_purchase": "desc"
            }
          },
          "aggs": {
            "by_latest_purchase": {
              "max": {
                "field": "purchase_date"
              }
            }
          }
        }
      }
    }
    

    但您最终还是会得到blue b/c,这是您的文档中存在的一种颜色,我不知道它是否可以被过滤掉。


    如有疑问(或其他方法均失败),请scripted metric aggregations 进行救援:

    {
      "size": 0, 
      "aggs": {
        "by_color": {
          "scripted_metric": {
            "init_script": "state.by_order_id = [:]",
            "map_script": """
              def color = doc['color.keyword'].value;
              def date = doc['purchase_date'].value.millis;
              def order_id = doc['order_id'].value;
              
              def current_group = ['color':color, 'date': date];
              
              if (state.by_order_id.containsKey(order_id)) {
                def max_group = state.by_order_id[order_id];
                if (date > max_group.date) {
                  // we've found a new maximum
                  state.by_order_id[order_id] = current_group
                }
              } else {
                state.by_order_id[order_id] = current_group;
              }
            """,
            "combine_script": """
              def colors_vs_count = [:];
              
              for (def group : state.by_order_id.entrySet()) {
                def order_id = group.getKey();
                def color = group.getValue()['color'];
                if (colors_vs_count.containsKey(color)) {
                  colors_vs_count[color]++;
                } else {
                  colors_vs_count[color] = 1;
                }
              }
              
              return colors_vs_count;
            """,
            "reduce_script": "return states"
          }
        }
      }
    }
    

    屈服:

    ...
    "aggregations" : {
      "by_color" : {
        "value" : [
          {
            "Red" : 2,
            "Black" : 1
          }
        ]
      }
    }
    

    这是一个 JSON 友好的精简版脚本:

    {"size":0,"aggs":{"by_color":{"scripted_metric":{"init_script":"state.by_order_id = [:]","map_script":"          def color = doc['color.keyword'].value;\n          def date = doc['purchase_date'].value.millis;\n          def order_id = doc['order_id'].value;\n          \n          def current_group = ['color':color, 'date': date];\n          \n          if (state.by_order_id.containsKey(order_id)) {\n            def max_group = state.by_order_id[order_id];\n            if (date > max_group.date) {\n              state.by_order_id[order_id] = current_group\n            }\n          } else {\n            state.by_order_id[order_id] = current_group;\n          }","combine_script":"          def colors_vs_count = [:];\n          \n          for (def group : state.by_order_id.entrySet()) {\n            def order_id = group.getKey();\n            def color = group.getValue()['color'];\n            if (colors_vs_count.containsKey(color)) {\n              colors_vs_count[color]++;\n            } else {\n              colors_vs_count[color] = 1;\n            }\n          }\n          \n          return colors_vs_count;","reduce_script":"return states"}}}}
    

    【讨论】:

    • 谢谢,但脚本似乎只能在一个分片上正常工作?
    • 我认为by_color>value 数组将包含与您拥有的分片一样多的对象/字典。更多关于这个here。如果脚本因多个分片失败,请告诉我。
    猜你喜欢
    • 2019-01-01
    • 2018-11-15
    • 2023-03-18
    • 1970-01-01
    • 1970-01-01
    • 2021-01-27
    • 2022-08-11
    • 1970-01-01
    • 2017-11-02
    相关资源
    最近更新 更多