MongoDB 复杂子文档查询答案

【问题标题】：MongoDB Complex Subdocument QueryMongoDB 复杂子文档查询
【发布时间】：2014-10-27 18:41:01
【问题描述】：

我有一个包含超过 100,000 个文档的集合，其中包含多个嵌套数组。我需要根据位于最低级别的属性进行查询，并仅返回数组底部的对象。

文档结构：

    {
    _id: 12345,
    type: "employee",
    people: [
        {
            name: "Rob",
            items: [
                {
                    itemName: "RobsItemOne",
                    value: "$10.00",
                    description: "some description about the item"
                },
                {
                    itemName: "RobsItemTwo",
                    value: "$15.00",
                    description: "some description about the item"
                }
            ]
        }
    ]
}

我一直在使用聚合管道来获得可以正常工作的预期结果，但是性能非常糟糕。这是我的查询：

db.collection.aggregate([
            {
                $match: {
                    "type": "employee"
                }
            },

            {$unwind: "$people"},
            {$unwind: "$people.items"},
            {$match: {$or: [ //There could be dozens of items included in this $match
                             {"people.items.itemName": "RobsItemOne"},
                             {"people.items.itemName": "RobsItemTwo"}
                           ]
                     }
            },
            {
                $project: {
                    _id: 0,// This is because of the $out
                    systemID: "$_id",
                    type: "$type",
                    item: "$people.items.itemName",
                    value: "$people.items.value"
                }
            },
            {$out: tempCollection} //Would like to avoid this, but was exceeding max document size
        ])

结果是：

[ 
    {
        "type" : "employee",
        "systemID" : 12345,
        "item" : "RobsItemOne",
        "value" : "$10.00"
    }, 
    {
        "type" : "employee",
        "systemID" : 12345,
        "item" : "RobsItemTwo",
        "value" : "$10.00"
    }
]

我可以做些什么来加快这个查询？我尝试过使用索引，但根据 Mongo 文档，超过初始 $match 的索引将被忽略。

【问题讨论】：

标签： mongodb performance aggregation-framework nosql

【解决方案1】：

您还可以尝试在$unwind 人之后将$match 运算符添加到您的查询中。

...{$unwind: "$people"},
{$match:{"people.items.itemName":{$in:["RobsItemOne","RobsItemTwo"]}}},
{$unwind: "$people.items"}, ....

这将减少以下$unwind 和$match 运算符要查询的记录数。

由于您有大量记录，您可以使用{allowDiskUse:true} option.which，

允许写入临时文件。当设置为 true 时，聚合 stage 可以将数据写入 dbPath 中的 _tmp 子目录目录。

所以，您的最终查询是这样的：

db.collection.aggregate([
        {
            $match: {
                "type": "employee"
            }
        },

        {$unwind: "$people"},
        {$match:{"people.items.itemName":{$in:["RobsItemOne","RobsItemTwo"]}}},
        {$unwind: "$people.items"},
        {$match: {$or: [ //There could be dozens of items included in this $match
                         {"people.items.itemName": "RobsItemOne"},
                         {"people.items.itemName": "RobsItemTwo"}
                       ]
                 }
        },
        {
            $project: {
                _id: 0,// This is because of the $out
                systemID: "$_id",
                type: "$type",
                item: "$people.items.itemName",
                value: "$people.items.value"
            }
        }

    ], {allowDiskUse:true})

【讨论】：

我会试一试。在这里选择聚合管道而不是 Map Reduce 是正确的选择吗？
请看这个：stackoverflow.com/questions/16310730/…
在上面的例子中，所有的文档都有唯一的键，所以不会对所有的文档调用reduce函数。即使您为所有文档发出一个公共密钥，reduce 函数也必须将大量文档作为输入，处理将比聚合管道慢得多，因为管道会在 $match 阶段消除文档。跨度>

【解决方案2】：

我发现在@BatScream 的努力之后，还有其他一些可以改进的地方。你可以试一试。

// if the final result set is relatively small, this index will be helpful.
db.collection.ensureIndex({type : 1, "people.items.itemName" : 1 });

var itemCriteria = {
    $in : [ "RobsItemOne", "RobsItemTwo" ]
};

db.collection.aggregate([ {
    $match : {
        "type" : "employee",
        "people.items.itemName" : itemCriteria      // add this criteria to narrow source range further
    }
}, {
    $unwind : "$people"
}, {
    $match : {
        "people.items.itemName" : itemCriteria      // narrow data range further
    }
}, {
    $unwind : "$people.items"
}, {
    $match : {
        "people.items.itemName" : itemCriteria      // final match, avoid to use $or operator
    }
}, {
    $project : {
        _id : 0,                                    // This is because of the $out
        systemID : "$_id",
        type : "$type",
        item : "$people.items.itemName",
        value : "$people.items.value"
    }
}, {
    $out: tempCollection                            // optional
} ], {
    allowDiskUse : true
});

【讨论】：