MongoDB 聚合中的多个 $project 阶段是否会影响性能答案

【问题标题】：Does multiple $project stages in MongoDB aggregation affect performanceMongoDB 聚合中的多个 $project 阶段是否会影响性能
【发布时间】：2018-12-03 09:13:31
【问题描述】：

TL;DR

我们在$match 和$lookup 阶段之间添加$project 阶段，以过滤掉不必要的数据或为字段设置别名。那些$project 阶段在调试时提高了查询的读取能力，但它们会影响当查询中涉及的每个集合中都有大量文档时，任何方式的性能。

问题详解

例如我有两个集合 schools 和 students 如下所示：

是的，我知道架构设计很糟糕！ MongoDB 说 - 将所有内容放在同一个集合中以避免关系，但现在让我们继续使用这种方法。

学校收藏

{
    "_id": ObjectId("5c04dca4289c601a393d9db8"),
    "name": "First School Name",
    "address": "1 xyz",
    "status": 1,
    // Many more fields
},
{
    "_id": ObjectId("5c04dca4289c601a393d9db9"),
    "name": "Second School Name",
    "address": "2 xyz",
    "status": 1,
    // Many more fields
},
// Many more Schools

学生收藏

{
    "_id": ObjectId("5c04dcd5289c601a393d9dbb"),
    "name": "One Student Name",
    "school_id": ObjectId("5c04dca4289c601a393d9db8"),
    "address": "1 abc",
    "Gender": "Male",
    // Many more fields
},
{
    "_id": ObjectId("5c04dcd5289c601a393d9dbc"),
    "name": "Second Student Name",
    "school_id": ObjectId("5c04dca4289c601a393d9db9"),
    "address": "1 abc",
    "Gender": "Male",
    // Many more fields
},
// Many more students

现在在我的查询中，如下所示，我在$match 之后有一个$project 阶段，就在$lookup 之前。那么这个$project 阶段是必要的吗？当查询涉及的所有集合中有大量文档时，此阶段是否会影响性能？

db.students.aggregate([
    {
        $match: {
            "Gender": "Male"
        }
    },
    // 1. Below $project stage is not necessary apart from filtering out and aliasing.
    // 2. Will this stage affect performance when there are huge number of documents?
    {
        $project: {
            "_id": 0,
            "student_id": "$_id",
            "student_name": "$name",
            "school_id": 1
        }
    },
    {
        $lookup: {
            from: "schools",
            let: {
                "school_id": "$school_id"
            },
            pipeline: [
                {
                    $match: {
                        "status": 1,
                        $expr: {
                            $eq: ["$_id", "$$school_id"]
                        }
                    }
                },
                {
                    $project: {
                        "_id": 0,
                        "name": 1
                    }
                }
            ],
            as: "school"
        }
    },
    {
        $unwind: "$school"
    }
]);

【问题讨论】：

标签： mongodb mongodb-query aggregation-framework query-performance

【解决方案1】：

请阅读：https://docs.mongodb.com/v3.2/core/aggregation-pipeline-optimization/

与您的具体情况相关的是 The aggregation pipeline can determine if it requires only a subset of the fields in the documents to obtain the results. If so, the pipeline will only use those required fields, reducing the amount of data passing through the pipeline.

因此，在幕后进行了一些优化。您可以尝试在聚合中添加说明选项，以准确了解 mongo 正在做什么来尝试优化您的管道。

我认为您正在做的事情实际上应该有助于提高性能，因为您正在减少流过的数据量。

【讨论】：