【问题标题】:How to sort nested aggregation field based on parent document field in elasticsearch?如何根据elasticsearch中的父文档字段对嵌套聚合字段进行排序?
【发布时间】:2021-06-01 00:41:16
【问题描述】:

我有不同位置的商店索引。对于每家商店,我都有一个嵌套的折扣券列表。

现在我有查询以获取半径为 x 公里的所有唯一优惠券的列表,按给定位置上最近适用优惠券的距离排序

数据库 :: Elasticsearch

索引映射::

{
"mappings": {
    "car_stores": {
        "properties": {
            "location": {
                "type": "geo_point"
            },
            "discount_coupons": {
                "type": "nested",
                "properties": {
                    "name": {
                        "type": "keyword"
                    }
                }
            }
        }
    }
}
}

示例文档::

{
"_index": "stores",
"_type": "car_stores",
"_id": "1258c81d-b6f2-400f-a448-bd728f524b55",
"_score": 1.0,
"_source": {
    "location": {
        "lat": 36.053757,
        "lon": 139.525482
    },
    "discount_coupons": [
        {
            "name": "c1"
        },
        {
            "name": "c2"
        }
    ]
}
}

获取给定位置 x km 区域内唯一折扣券名称的旧查询 ::

{
"size": 0,
"query": {
    "bool": {
        "must": {
            "match_all": {}
        },
        "filter": {
            "geo_distance": {
                "distance": "100km",
                "location": {
                    "lat": 40,
                    "lon": -70
                }
            }
        }
    }
},
"aggs": {
    "coupon": {
        "nested": {
            "path": "discount_coupons"
        },
        "aggs": {
            "name": {
                "terms": {
                    "field": "discount_coupons.name",
                    "order": {
                        "_key": "asc"
                    },
                    "size": 200
                }
            }
        }
    }
}
}

更新回复 ::

{
"took": 60,
"timed_out": false,
"_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
},
"hits": {
    "total": 245328,
    "max_score": 0.0,
    "hits": []
},
"aggregations": {
    "coupon": {
        "doc_count": 657442,
        "name": {
            "doc_count_error_upper_bound": -1,
            "sum_other_doc_count": 641189,
            "buckets": [
                {
                    "key": "local20210211",
                    "doc_count": 1611,
                    "back_to_base": {
                        "doc_count": 1611,
                        "distance_script": {
                            "value": 160.61034409639765
                        }
                    }
                },
                {
                    "key": "local20210117",
                    "doc_count": 1621,
                    "back_to_base": {
                        "doc_count": 1621,
                        "distance_script": {
                            "value": 77.51459886447356
                        }
                    }
                },
                {
                    "key": "local20201220",
                    "doc_count": 1622,
                    "back_to_base": {
                        "doc_count": 1622,
                        "distance_script": {
                            "value": 84.15734462544432
                        }
                    }
                },
                {
                    "key": "kisekae1",
                    "doc_count": 1626,
                    "back_to_base": {
                        "doc_count": 1626,
                        "distance_script": {
                            "value": 88.23770888201268
                        }
                    }
                },
                {
                    "key": "local20210206",
                    "doc_count": 1626,
                    "back_to_base": {
                        "doc_count": 1626,
                        "distance_script": {
                            "value": 86.78376012847237
                        }
                    }
                },
                {
                    "key": "local20210106",
                    "doc_count": 1628,
                    "back_to_base": {
                        "doc_count": 1628,
                        "distance_script": {
                            "value": 384.12156408078397
                        }
                    }
                },
                {
                    "key": "local20210113",
                    "doc_count": 1628,
                    "back_to_base": {
                        "doc_count": 1628,
                        "distance_script": {
                            "value": 153.61681676703674
                        }
                    }
                },
                {
                    "key": "local20",
                    "doc_count": 1629,
                    "back_to_base": {
                        "doc_count": 1629,
                        "distance_script": {
                            "value": 168.74132991524073
                        }
                    }
                },
                {
                    "key": "local20210213",
                    "doc_count": 1630,
                    "back_to_base": {
                        "doc_count": 1630,
                        "distance_script": {
                            "value": 155.8335679860034
                        }
                    }
                },
                {
                    "key": "local20210208",
                    "doc_count": 1632,
                    "back_to_base": {
                        "doc_count": 1632,
                        "distance_script": {
                            "value": 99.58790590445102
                        }
                    }
                }
            ]
        }
    }
}
}

现在上面的查询将返回按计数排序的前 200 张折扣券默认值,但我想返回基于给定位置的距离排序的优惠券,即最接近适用的优惠券应该排在第一位。

有没有办法根据父键对嵌套聚合进行排序,或者我可以使用不同的数据模型解决这个用例吗?

更新查询::

{
"size": 0,
"query": {
    "bool": {
        "filter": [
            {
                "geo_distance": {
                    "distance": "100km",
                    "location": {
                        "lat": 35.699104,
                        "lon": 139.825211
                    }
                }
            },
            {
                "nested": {
                    "path": "discount_coupons",
                    "query": {
                        "bool": {
                            "filter": {
                                "exists": {
                                    "field": "discount_coupons"
                                }
                            }
                        }
                    }
                }
            }
        ]
    }
},
"aggs": {
    "coupon": {
        "nested": {
            "path": "discount_coupons"
        },
        "aggs": {
            "name": {
                "terms": {
                    "field": "discount_coupons.name",
                    "order": {
                        "back_to_base": "asc"
                    },
                    "size": 10
                },
                "aggs": {
                    "back_to_base": {
                        "reverse_nested": {},
                        "aggs": {
                            "distance_script": {
                                "min": {
                                    "script": {
                                        "source": "doc['location'].arcDistance(35.699104, 139.825211)"
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
}

【问题讨论】:

    标签: elasticsearch elasticsearch-aggregation elasticsearch-nested


    【解决方案1】:

    有趣的问题。您始终可以通过数字子聚合的结果来 orderterms 聚合。这里的技巧是通过reverse_nested aggregation 转义嵌套上下文,然后使用脚本从枢轴转义calculate the distance

    {
      "size": 0,
      "query": {
        "bool": {
          "must": {
            "match_all": {}
          },
          "filter": {
            "geo_distance": {
              "distance": "100km",
              "location": {
                "lat": 40,
                "lon": -70
              }
            }
          }
        }
      },
      "aggs": {
        "coupon": {
          "nested": {
            "path": "discount_coupons"
          },
          "aggs": {
            "name": {
              "terms": {
                "field": "discount_coupons.name",
                "order": {
                  "back_to_base": "asc"
                },
                "size": 200
              },
              "aggs": {
                "back_to_base": {
                  "reverse_nested": {},
                  "aggs": {
                    "distance_script": {
                      "min": {
                        "script": {
                          "source": "doc['location'].arcDistance(40, -70)"
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
    

    【讨论】:

    • 但是我已经阅读了脚本不推荐用于实时用例。如果我说我的流量为 500 rps,脚本肯定会产生高延迟另外,如果我没记错的话,es 中的脚本在间隔内运行定义的次数有一个上限
    • 你说得对——如果可能,应该避免使用脚本。另一方面,我认为没有任何更简单的数字聚合可以计算每个桶的地理距离和即时计算。您可以在沙盒中尝试此 agg,然后从那里开始。
    • 感谢您的帮助 沙箱是另一个工具还是数据库,您能给我正确的链接或工具的全名吗
    • 我所说的“沙盒”是指您可以控制的沙盒执行环境。因此,首先在所有文档的高度过滤的小节上运行它,然后降低过滤器,然后在较小的 qps 负载下尝试它,然后是较重的等。并非所有脚本都是平等的,这个可能工作得很好;)
    • 它只需要在订单字段中传递“back_to_base>distance_script”就可以了,它默认选择文档计数
    猜你喜欢
    • 1970-01-01
    • 2017-02-26
    • 2017-09-13
    • 2016-01-07
    • 2015-07-30
    • 1970-01-01
    • 1970-01-01
    • 2019-08-05
    • 1970-01-01
    相关资源
    最近更新 更多