【问题标题】:MongoDB Complex Subdocument QueryMongoDB 复杂子文档查询
【发布时间】:2014-10-27 18:41:01
【问题描述】:

我有一个包含超过 100,000 个文档的集合,其中包含多个嵌套数组。我需要根据位于最低级别的属性进行查询,并仅返回数组底部的对象。

文档结构:

    {
    _id: 12345,
    type: "employee",
    people: [
        {
            name: "Rob",
            items: [
                {
                    itemName: "RobsItemOne",
                    value: "$10.00",
                    description: "some description about the item"
                },
                {
                    itemName: "RobsItemTwo",
                    value: "$15.00",
                    description: "some description about the item"
                }
            ]
        }
    ]
}

我一直在使用聚合管道来获得可以正常工作的预期结果,但是性能非常糟糕。这是我的查询:

db.collection.aggregate([
            {
                $match: {
                    "type": "employee"
                }
            },

            {$unwind: "$people"},
            {$unwind: "$people.items"},
            {$match: {$or: [ //There could be dozens of items included in this $match
                             {"people.items.itemName": "RobsItemOne"},
                             {"people.items.itemName": "RobsItemTwo"}
                           ]
                     }
            },
            {
                $project: {
                    _id: 0,// This is because of the $out
                    systemID: "$_id",
                    type: "$type",
                    item: "$people.items.itemName",
                    value: "$people.items.value"
                }
            },
            {$out: tempCollection} //Would like to avoid this, but was exceeding max document size
        ])

结果是:

[ 
    {
        "type" : "employee",
        "systemID" : 12345,
        "item" : "RobsItemOne",
        "value" : "$10.00"
    }, 
    {
        "type" : "employee",
        "systemID" : 12345,
        "item" : "RobsItemTwo",
        "value" : "$10.00"
    }
]

我可以做些什么来加快这个查询?我尝试过使用索引,但根据 Mongo 文档,超过初始 $match 的索引将被忽略。

【问题讨论】:

    标签: mongodb performance aggregation-framework nosql


    【解决方案1】:

    您还可以尝试在$unwind 人之后将$match 运算符添加到您的查询中。

    ...{$unwind: "$people"},
    {$match:{"people.items.itemName":{$in:["RobsItemOne","RobsItemTwo"]}}},
    {$unwind: "$people.items"}, ....
    

    这将减少以下$unwind$match 运算符要查询的记录数。

    由于您有大量记录,您可以使用{allowDiskUse:true} option.which,

    允许写入临时文件。当设置为 true 时,聚合 stage 可以将数据写入 dbPath 中的 _tmp 子目录 目录。

    所以,您的最终查询是这样的:

    db.collection.aggregate([
            {
                $match: {
                    "type": "employee"
                }
            },
    
            {$unwind: "$people"},
            {$match:{"people.items.itemName":{$in:["RobsItemOne","RobsItemTwo"]}}},
            {$unwind: "$people.items"},
            {$match: {$or: [ //There could be dozens of items included in this $match
                             {"people.items.itemName": "RobsItemOne"},
                             {"people.items.itemName": "RobsItemTwo"}
                           ]
                     }
            },
            {
                $project: {
                    _id: 0,// This is because of the $out
                    systemID: "$_id",
                    type: "$type",
                    item: "$people.items.itemName",
                    value: "$people.items.value"
                }
            }
    
        ], {allowDiskUse:true})
    

    【讨论】:

    • 我会试一试。在这里选择聚合管道而不是 Map Reduce 是正确的选择吗?
    • 在上面的例子中,所有的文档都有唯一的键,所以不会对所有的文档调用reduce函数。即使您为所有文档发出一个公共密钥,reduce 函数也必须将大量文档作为输入,处理将比聚合管道慢得多,因为管道会在 $match 阶段消除文档。跨度>
    【解决方案2】:

    我发现在@BatScream 的努力之后,还有其他一些可以改进的地方。你可以试一试。

    // if the final result set is relatively small, this index will be helpful.
    db.collection.ensureIndex({type : 1, "people.items.itemName" : 1 });
    
    var itemCriteria = {
        $in : [ "RobsItemOne", "RobsItemTwo" ]
    };
    
    db.collection.aggregate([ {
        $match : {
            "type" : "employee",
            "people.items.itemName" : itemCriteria      // add this criteria to narrow source range further
        }
    }, {
        $unwind : "$people"
    }, {
        $match : {
            "people.items.itemName" : itemCriteria      // narrow data range further
        }
    }, {
        $unwind : "$people.items"
    }, {
        $match : {
            "people.items.itemName" : itemCriteria      // final match, avoid to use $or operator
        }
    }, {
        $project : {
            _id : 0,                                    // This is because of the $out
            systemID : "$_id",
            type : "$type",
            item : "$people.items.itemName",
            value : "$people.items.value"
        }
    }, {
        $out: tempCollection                            // optional
    } ], {
        allowDiskUse : true
    });
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2017-01-04
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2016-09-29
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多