【发布时间】:2018-12-03 09:13:31
【问题描述】:
TL;DR
我们在$match 和$lookup 阶段之间添加$project 阶段,以过滤掉不必要的数据或为字段设置别名。那些$project 阶段在调试时提高了查询的读取能力,但它们会影响当查询中涉及的每个集合中都有大量文档时,任何方式的性能。
问题详解
例如我有两个集合 schools 和 students 如下所示:
是的,我知道架构设计很糟糕! MongoDB 说 - 将所有内容放在同一个集合中以避免关系,但现在让我们继续使用这种方法。
学校收藏
{
"_id": ObjectId("5c04dca4289c601a393d9db8"),
"name": "First School Name",
"address": "1 xyz",
"status": 1,
// Many more fields
},
{
"_id": ObjectId("5c04dca4289c601a393d9db9"),
"name": "Second School Name",
"address": "2 xyz",
"status": 1,
// Many more fields
},
// Many more Schools
学生收藏
{
"_id": ObjectId("5c04dcd5289c601a393d9dbb"),
"name": "One Student Name",
"school_id": ObjectId("5c04dca4289c601a393d9db8"),
"address": "1 abc",
"Gender": "Male",
// Many more fields
},
{
"_id": ObjectId("5c04dcd5289c601a393d9dbc"),
"name": "Second Student Name",
"school_id": ObjectId("5c04dca4289c601a393d9db9"),
"address": "1 abc",
"Gender": "Male",
// Many more fields
},
// Many more students
现在在我的查询中,如下所示,我在$match 之后有一个$project 阶段,就在$lookup 之前。
那么这个$project 阶段是必要的吗?
当查询涉及的所有集合中有大量文档时,此阶段是否会影响性能?
db.students.aggregate([
{
$match: {
"Gender": "Male"
}
},
// 1. Below $project stage is not necessary apart from filtering out and aliasing.
// 2. Will this stage affect performance when there are huge number of documents?
{
$project: {
"_id": 0,
"student_id": "$_id",
"student_name": "$name",
"school_id": 1
}
},
{
$lookup: {
from: "schools",
let: {
"school_id": "$school_id"
},
pipeline: [
{
$match: {
"status": 1,
$expr: {
$eq: ["$_id", "$$school_id"]
}
}
},
{
$project: {
"_id": 0,
"name": 1
}
}
],
as: "school"
}
},
{
$unwind: "$school"
}
]);
【问题讨论】:
标签: mongodb mongodb-query aggregation-framework query-performance