【问题标题】:How to sum TOP N docs with terms in elasticsearch?如何将 TOP N 文档与弹性搜索中的术语相加?
【发布时间】:2020-04-05 04:29:59
【问题描述】:

以下是 elasticsearch 的示例文档。

         {
            "_index": “social”,
            "_type": “social”,
            "_id": "1632560884596186633",
            "_score": 1,
            "_source": {
                "created_date": "2017-10-24",
                "reach": 1692,                    
                "social_id": 200
            }
        },
        {
            "_index": “social”,
            "_type": “social”,
            "_id": "1626693964184981799",
            "_score": 1,
            "_source": {
                "created_date": "2017-10-25”,
                "reach": 1692,                    
                “social_id": 100               
            }
        },
        {
            "_index": “social”,
            "_type": “social”,
            "_id": "162669396418498170",
            "_score": 1,
            "_source": {
                "created_date": "2017-10-25”,
                "reach": 1692,                    
                “social_id": 50               
            }
        },
        {
            "_index": “social”,
            "_type": “social”,
            "_id": "1626693964184981756",
            "_score": 1,
            "_source": {
                "created_date": "2017-10-25”,
                "reach": 1692,                    
                “social_id": 25               
            }
        }

问题:根据每个社交 ID 的创建日期,前 2 个文档的覆盖面总和。

我尝试过的:

{
"size": 0,
"aggs": {
    "reach_bucket": {
        "terms": {
            "size": 200,
            "field": "social_id"
        },
        "aggs": {
            "media_reach_bucket": {
                "terms": {
                    "field": "created_date",
                    "size": 200
                },
                "aggs": {
                    "top_sales_hits": {
                        "top_hits": {
                            "sort": [
                                {
                                    "created_date": {
                                        "order": "desc"
                                    }
                                }
                            ],
                            "_source": {
                                "includes": [
                                    "created_date",
                                    "reach"
                                ]
                            },
                            "size": 2
                        }
                    }
                }
            }
        }
    }
}
} 

问题:

不做top_hits的子聚合。

任何建议将不胜感激。

【问题讨论】:

    标签: elasticsearch


    【解决方案1】:

    您可能希望在每天进行分桶时使用date_histogram 而不是terms(我假设)。但更重要的是,您应该按reach 而非created_datetop_hits 进行排序,因为这在您的每日存储桶中将是相同的。

    {
      "size": 0,
      "aggs": {
        "reach_bucket": {
          "terms": {
            "size": 200,
            "field": "social_id"
          },
          "aggs": {
            "media_reach_bucket": {
              "date_histogram": {
                "field": "created_date",
                "calendar_interval": "day"
              },
              "aggs": {
                "top_sales_hits": {
                  "top_hits": {
                    "sort": [
                      {
                        "reach": {
                          "order": "desc"
                        }
                      }
                    ],
                    "_source": {
                      "includes": [
                        "reach"
                      ]
                    },
                    "size": 2
                  }
                }
              }
            }
          }
        }
      }
    }
    

    像这样产生热门歌曲

    "aggregations" : {
        "reach_bucket" : {
          "doc_count_error_upper_bound" : 0,
          "sum_other_doc_count" : 0,
          "buckets" : [
            {
              "key" : 100,
              "doc_count" : 4,
              "media_reach_bucket" : {
                "buckets" : [
                  {
                    "key_as_string" : "2017-10-24T00:00:00.000Z",
                    "key" : 1508803200000,
                    "doc_count" : 4,
                    "top_sales_hits" : {
                      "hits" : {
                        "total" : {
                          "value" : 4,
                          "relation" : "eq"
                        },
                        "max_score" : null,
                        "hits" : [
                          {
                            "_index" : "kart",
                            "_type" : "_doc",
                            "_id" : "3iLJRnEBZbobBB0NiV8R",
                            "_score" : null,
                            "_source" : {
                              "reach" : 40
                            },
                            "sort" : [
                              40
                            ]
                          },
                          {
                            "_index" : "kart",
                            "_type" : "_doc",
                            "_id" : "3SLJRnEBZbobBB0Nhl-Y",
                            "_score" : null,
                            "_source" : {
                              "reach" : 30
                            },
                            "sort" : [
                              30
                            ]
                          }
                        ]
                      }
                    }
                  }
                ]
              }
            }
          ]
        }
      }
    

    然后您可以在您的后期处理功能中总结其覆盖范围。


    我不熟悉 top-n 总和,只有文档总和超过某个阈值——在这种情况下,我会使用 filter aggregations

    【讨论】:

    • 如何对 top_hits 求和
    猜你喜欢
    • 1970-01-01
    • 2015-06-23
    • 1970-01-01
    • 2014-06-02
    • 1970-01-01
    • 1970-01-01
    • 2014-11-18
    • 1970-01-01
    • 2016-10-08
    相关资源
    最近更新 更多