先分组、排序和选择答案

【问题标题】：Group, sort and select first先分组、排序和选择
【发布时间】：2015-10-29 19:36:43
【问题描述】：

假设我有一个这样的集合：

{
  couponId: "abc",
  state: "valid",
  date: "2015-11-01"
}
{
  couponId: "abc",
  state: "expired",
  date: "2015-10-01"
}
{
  couponId: "abc",
  state: "invalid",
  date: "2015-09-01"
}
{
  couponId: "xyz",
  state: "invalid",
  date: "2015-11-01"
}
{
  couponId: "xyz",
  state: "expired",
  date: "2015-10-01"
}
{
  couponId: "xyz",
  state: "expired",
  date: "2015-09-01"
}
...

优惠券可以有效/无效/过期。现在，我想获取优惠券列表，其中每张优惠券都是根据以下逻辑选择的：

如果存在“有效”优惠券，请使用该优惠券
如果存在“过期”优惠券，则使用该优惠券
否则获得“无效”优惠券。

将此逻辑应用于上述列表应该会产生：

    {
      couponId: "abc", /* for "abc" "valid" exists */
      state: "valid",
      date: "2015-11-01"
    },
    {
      couponId: "xyz", /* for "xyz" "valid" does not exist, use the next best "expired" */
      state: "expired",
      date: "2015-11-01"
    }

基本上，有效>过期>无效

曾想过使用聚合操作，尝试模拟SQL groupby+sort+selectFirst，

db.xyz.aggregate([
  {$sort : { couponId: 1, state: -1 } },
  {$group : { _id : "$couponId", document: {$first : "$$ROOT"} }}
])

显然这不起作用，因为“状态”字段应该有一个自定义排序，其中有效>过期>无效。那么，可以在聚合中实现自定义排序吗？

或者有没有更好的方法来做我想做的事情？

【问题讨论】：

标签： mongodb aggregation-framework database nosql

【解决方案1】：

一个更好的方法，虽然不是那么干净/漂亮（对我来说似乎有点 hacky :P），是在你的 中嵌套一些具有上述条件/逻辑的 $cond 运算符表达式>$group 流水线阶段。最好通过运行以下聚合管道来解释：

db.xyz.aggregate([
    {
        "$group": {
            "_id": "$couponId",
            "state": {
                "$first": {
                    "$cond": [
                        { "$eq": ["$state", "valid"]},
                        "$state",
                        {
                           "$cond": [
                                { "$eq": ["$state", "invalid"]},
                                "$state",
                                "expired"
                            ]
                        }
                    ]
                }
            },
            "date": {
                "$first": {
                    "$cond": [
                        { "$eq": ["$state", "valid"]},
                        "$date",
                        {
                           "$cond": [
                                { "$eq": ["$state", "invalid"]},
                                "$date",
                                {
                                   "$cond": [
                                        { "$eq": ["$state", "expired"]},
                                        "$date", 
                                        "1970-01-01"
                                    ]
                                }
                            ]
                        }
                    ]
                }
            }
        }
    }
])

样本输出

/* 0 */
{
    "result" : [ 
        {
            "_id" : "xyz",
            "state" : "invalid",
            "date" : "2015-11-01"
        }, 
        {
            "_id" : "abc",
            "state" : "valid",
            "date" : "2015-11-01"
        }
    ],
    "ok" : 1
}

【讨论】：

谢谢！我只是在查找相同的内容，关于如何在查询期间动态翻译值。我试过这个，虽然这有效，但对性能有重大影响。我的数据库有大约 150 万个文档，实际文档要大得多。没有“翻译”，使用默认排序，花费的时间约为 0.6 秒，使用字段的翻译/翻转，花费的时间是第一次查询的 4 倍。作为最后的手段，我可能需要更改名称或引入另一个字段...
我想您可以通过在 aggregate() 方法中包含解释选项来进行一些优化以重塑管道以提高性能。在开头添加一些$match 步骤以过滤掉不需要的文档，例如{ $match: {$state: { $in: ["expired", "invalid", "valid" ] }} } 等
我真的不明白这个解决方案是如何解决问题的，解决方案脚本和刚才的db.xyz.aggregate([{"$group":{"_id":"$couponId","state":{"$first":"$state"},"date":{"$first":"$date"}}}])有什么区别。