如何使用聚合根据mongodb中的子字段查找父母？答案

【问题标题】：How to find parents based on child fields in mongo using aggregation?如何使用聚合根据mongodb中的子字段查找父母？
【发布时间】：2017-05-08 13:46:46
【问题描述】：

这是我的代码：

const _ = require('lodash')
const Box = require('./models/Box')

const boxesToBePicked = await Box.find({ status: 'ready', client: 27 })
const boxesOriginalIds = _(boxesToBePicked).map('original').compact().uniq().value()
const boxesOriginal = boxesOriginalIds.length ? await Box.find({ _id: { $in: boxesOriginalIds } }) : []

const attributes = ['name']

const boxes = [
  ...boxesOriginal,
  ...boxesToBePicked.filter(box => !box.original)
].map(box => _.pick(box, attributes))

假设我们在“boxes”集合中有以下数据：

[
  { _id: 1, name: 'Original Box #1', status: 'pending' },
  { _id: 2, name: 'Nested box', status: 'ready', original: 1 },
  { _id: 3, name: 'Nested box', status: 'ready', original: 1 },
  { _id: 4, name: 'Nested box', status: 'pending', original: 1 },
  { _id: 5, name: 'Original Box #2', status: 'ready' },
  { _id: 6, name: 'Original Box #3', status: 'pending' },
  { _id: 7, name: 'Nested box', status: 'ready', original: 6 },
  { _id: 8, name: 'Original Box #4', status: 'pending' }
]

工作流程

找到所有可以挑选的盒子：

const boxesToBePicked = await Box.find({ status: 'ready' })

// Returns:

[
  { _id: 2, name: 'Nested box', status: 'ready', original: 1 },
  { _id: 3, name: 'Nested box', status: 'ready', original: 1 },
  { _id: 5, name: 'Original Box #2', status: 'ready' },
  { _id: 7, name: 'Nested box', status: 'ready', original: 6 }
]

获取这些原始（父）框的所有 ID：

const boxesOriginalIds = _(boxesToBePicked).map('original').compact().uniq().value()

// Returns:

[1, 6]

通过 ID 获取这些盒子：

const boxesOriginal = boxesOriginalIds.length ? await Box.find({ _id: { $in: boxesOriginalIds } }) : []

// Returns

[
  { _id: 1, name: 'Original Box #1', status: 'pending' },
  { _id: 6, name: 'Original Box #3', status: 'pending' }
]

加入那些没有嵌套框的框：

const boxes = [
  ...boxesOriginal,
  ...boxesToBePicked.filter(box => !box.original)
].map(box => _.pick(box, attributes))

// Returns

[
  { name: 'Original Box #1' },
  { name: 'Original Box #3' },
  { name: 'Original Box #2' }
]

所以基本上我们在这里所做的是获取所有原始框，如果它们至少有一个状态为“就绪”的嵌套框，并且所有未嵌套的框状态为“就绪”。

我认为可以通过使用聚合管道和投影来简化它。但是怎么做呢？

【问题讨论】：

标签： javascript node.js mongodb mongoose aggregation-framework

【解决方案1】：

您可以尝试以下方法。使用 $lookUp 自加入集合和 $match 阶段，使用 $or 与 $and 组合用于第二个条件和 $or 的下一部分用于第一个条件和 $group 阶段以删除重复项，并使用 $project 阶段来格式化响应。

db.boxes.aggregate([{
    $lookup: {
        from: "boxes",
        localField: "original",
        foreignField: "_id",
        as: "nested_orders"
    }
}, {
    $unwind: {
        path: "$nested_orders",
        preserveNullAndEmptyArrays: true
    }
}, {
    $match: {
        $or: [{
            $and: [{
                "status": "ready"
            }, {
                "nested_orders": {
                    $exists: false,
                }
            }]
        }, {
            "nested_orders.status": "pending"
        }]
    }
}, {
    $group: {
        "_id": null,
        "names": {
            $addToSet: {
                name: "$name",
                nested_name: "$nested_orders.name"
            }
        }
    }
}, {
    $unwind: "$names"
}, {
    $project: {
        "_id": 0,
        "name": {
            $ifNull: ['$names.nested_name', '$names.name']
        }
    }
}]).pretty();

示例响应

{ "name" : "Original Box #1" }
{ "name" : "Original Box #2" }
{ "name" : "Original Box #3" }

【讨论】：

我认为 $lookup 作为第一个操作在性能方面会非常繁重，因为它会考虑集合中的所有框。
这取决于您的输入集合的大小，可能您可以应用过滤器来限制进入查找阶段的输入。我会建议对您的数据的解决方案进行性能测试。顺便说一句，更新了答案以匹配预期的响应。
您的想法是正确的，但顺序错误，这使得它过于复杂。您应该首先累积要拾取的盒子的 _ids，然后然后使用 $lookup 以它们的名称重新水合它们。

【解决方案2】：

分解聚合：

$group 创建

一个数组 ids 匹配准备好的 status，它将为其添加 *original 值
一个数组box_ready 匹配就绪status 并保持其他字段不变（稍后会用到）

一个数组 document 包含整个原始文档 ($$ROOT)

{
    $group: {
        _id: null,
        ids: {
            $addToSet: {
                $cond: [
                    { $eq: ["$status", "ready"] },
                    "$original", null
                ]
            }
        },
        box_ready: {
            $addToSet: {
                $cond: [
                    { $eq: ["$status", "ready"] },
                    { _id: "$_id", name: "$name", original: "$original", status: "$status" },
                    null
                ]
            }
        },
        document: { $push: "$$ROOT" }
    }
}

$unwind文档字段删除数组
```
{
    $unwind: "$document"
}
```

使用$redact 聚合来保留或删除基于先前创建的数组ids 中$document._id 匹配的记录（包含匹配的original 和status）

{
    $redact: {
        "$cond": {
            "if": {
                "$setIsSubset": [{
                        "$map": {
                            "input": { "$literal": ["A"] },
                            "as": "a",
                            "in": "$document._id"
                        }
                    },
                    "$ids"
                ]
            },
            "then": "$$KEEP",
            "else": "$$PRUNE"
        }
    }
}

$group 将与之前的$redact 匹配的所有文档推送到另一个名为filtered 的数组中（我们现在有2 个可以合并的数组）
```
{
    $group: {
        _id: null,
        box_ready: { $first: "$box_ready" },
        filtered: { $push: "$document" }
    }
}
```

使用$project 和setUnion 来合并数组box_ready 和filtered

{
    $project: {
        union: {
            $setUnion: ["$box_ready", "$filtered"]
        },
        _id: 0
    }
}

$unwind获取不同记录的数组
```
{
    $unwind: "$union"
}
```
$match 只有那些缺少original 并且不为空的（因为最初的状态：就绪条件必须在第一个$group 上获得空值
```
{
    $match: {
        "union.original": {
            "$exists": false
        },
        "union": { $nin: [null] }
    }
}
```

整个聚合查询是：

db.collection.aggregate(
    [{
        $group: {
            _id: null,
            ids: {
                $addToSet: {
                    $cond: [
                        { $eq: ["$status", "ready"] },
                        "$original", null
                    ]
                }
            },
            box_ready: {
                $addToSet: {
                    $cond: [
                        { $eq: ["$status", "ready"] },
                        { _id: "$_id", name: "$name", original: "$original", status: "$status" },
                        null
                    ]
                }
            },
            document: { $push: "$$ROOT" }
        }
    }, {
        $unwind: "$document"
    }, {
        $redact: {
            "$cond": {
                "if": {
                    "$setIsSubset": [{
                            "$map": {
                                "input": { "$literal": ["A"] },
                                "as": "a",
                                "in": "$document._id"
                            }
                        },
                        "$ids"
                    ]
                },
                "then": "$$KEEP",
                "else": "$$PRUNE"
            }
        }
    }, {

        $group: {
            _id: null,
            box_ready: { $first: "$box_ready" },
            filtered: { $push: "$document" }
        }

    }, {
        $project: {
            union: {
                $setUnion: ["$box_ready", "$filtered"]
            },
            _id: 0
        }
    }, {
        $unwind: "$union"
    }, {
        $match: {
            "union.original": {
                "$exists": false
            },
            "union": { $nin: [null] }
        }
    }]
)

它给你：

{ "union" : { "_id" : 1, "name" : "Original Box #1", "status" : "pending" } }
{ "union" : { "_id" : 5, "name" : "Original Box #2", "status" : "ready" } }
{ "union" : { "_id" : 6, "name" : "Original Box #3", "status" : "pending" } }

如果您想选择特定字段，请使用额外的$project

对于mongoose，你应该可以这样做来执行聚合：

Box.aggregate([
    //the whole aggregation here
], function(err, result) {

});

【讨论】：

谢谢，但它看起来很复杂。我很确定这段代码可以简化。
我不敢相信在 EF/LINQ 中很容易完成的事情在 Mongo 中却如此困难

【解决方案3】：

有几个答案很接近，但这是最有效的方法。它累积要拾取的盒子的“_id”值，然后使用$lookup“重新水化”每个（顶级）盒子的全部细节。

db.boxes.aggregate(
    {$group: {
         _id:null, 
         boxes:{$addToSet:{$cond:{
            if:{$eq:["$status","ready"]},
            then:{$ifNull:["$original","$_id"]},
            else:null
         }}}
    }},

    {$lookup: {
          from:"boxes",
          localField:"boxes",
          foreignField:"_id",
          as:"boxes"
    }}
)

您的结果基于样本数据：

{
"_id" : null,
"boxIdsToPickUp" : [
    {
        "_id" : 1,
        "name" : "Original Box #1",
        "status" : "pending"
    },
    {
        "_id" : 5,
        "name" : "Original Box #2",
        "status" : "ready"
    },
    {
        "_id" : 6,
        "name" : "Original Box #3",
        "status" : "pending"
    }
] }

请注意，$lookup 仅针对要拾取的框的 _id 值执行，这比对所有框执行此操作效率更高。

如果您希望管道更加高效，您需要在嵌套框文档中存储有关原始框的更多详细信息（如其名称）。

【讨论】：

【解决方案4】：

要实现您的目标，您可以按照以下步骤操作：

首先选择 status is ready 的记录（因为您想获得没有嵌套框但 status is ready 的父级以及至少有一个嵌套框 stats is ready )

使用$lookup查找父框

然后$group得到唯一的父框

然后$project框名

所以可以试试这个查询：

db.getCollection('boxes').aggregate(
        {$match:{"status":'ready'}},
        {$lookup: {from: "boxes", localField: "original", foreignField: "_id", as: "parent"}},
        {$unwind: {path: "$parent",preserveNullAndEmptyArrays: true}},
        {$group:{
                _id:null,
                list:{$addToSet:{"$cond": [ { "$ifNull": ["$parent.name", false] }, {name:"$parent.name"}, {name:"$name"} ]}}
                }
        },
        {$project:{name:"$list.name", _id:0}},
        {$unwind: "$name"}
 )

或

获取状态记录已准备就绪

获取所需的记录ID

根据recordID获取名称

db.getCollection('boxes').aggregate(
        {$match:{"status":'ready'}},
        {$group:{
                _id:null,
                parent:{$addToSet:{"$cond": [ { "$ifNull": ["$original", false] }, "$original", "$_id" ]}}
                }
        },
        {$unwind:"$parent"},
        {$lookup: {from: "boxes", localField: "parent", foreignField: "_id", as: "parent"}},
        {$project: {"name" : { $arrayElemAt: [ "$parent.name", 0 ] }, _id:0}}
 )

【讨论】：

【解决方案5】：

使用猫鼬 (4.x)

架构：

const schema = mongoose.Schema({
    _id: Number,
    ....
    status: String,
    original: { type: Number, ref: 'Box'}
});
const Box = mongoose.model('Box', schema);

实际查询：

Box
    .find({ status: 'ready' })
    .populate('original')
    .exec((err, boxes) => {
        if (err) return;
        boxes = boxes.map((b) => b.original ? b.original : b);
        boxes = _.uniqBy(boxes, '_id');
        console.log(boxes);
    });

关于 Mongoose#populate 的文档：http://mongoosejs.com/docs/populate.html

【讨论】：