MongoDB：慢查询，即使有索引答案

【问题标题】：MongoDB: Slow query, even with indexMongoDB：慢查询，即使有索引
【发布时间】：2015-05-19 15:22:39
【问题描述】：

我有一个网页，它使用 MongoDB 来存储和检索各种测量值。突然，在某些时候，我的网页变得如此缓慢以至于无法使用。原来，我的数据库是罪魁祸首。

我搜索并没有找到任何解决我的问题的方法，我深表歉意，因为我对 MongoDB 还很陌生，目前正在抓狂。

我使用的 MongoDB 版本是 2.4.6，在具有 20GB RAM 的 VM 机器上运行 Ubuntu 服务器 12.04。没有设置副本或分片。

首先，我将分析级别设置为 2，它显示了最慢的查询：

db.system.profile.find().sort({"millis":-1}).limit(1).pretty()
{
        "op" : "query",
        "ns" : "station.measurement",
        "query" : {
                "$query" : {
                        "e" : {
                                "$gte" : 0
                        },
                        "id" : "180"
                },
                "$orderby" : {
                        "t" : -1
                }
        },
        "ntoreturn" : 1,
        "ntoskip" : 0,
        "nscanned" : 3295221,
        "keyUpdates" : 0,
        "numYield" : 6,
        "lockStats" : {
                "timeLockedMicros" : {
                        "r" : NumberLong(12184722),
                        "w" : NumberLong(0)
                },
                "timeAcquiringMicros" : {
                        "r" : NumberLong(5636351),
                        "w" : NumberLong(5)
                }
        },
        "nreturned" : 0,
        "responseLength" : 20,
        "millis" : 6549,
        "ts" : ISODate("2015-03-16T08:57:07.772Z"),
        "client" : "127.0.0.1",
        "allUsers" : [ ],
        "user" : ""
}

我用 .explain() 运行了那个特定的查询，看起来它应该使用索引，但它需要的时间太长。我还在另一台性能较弱的服务器上运行了相同的查询，并在一秒钟内像冠军一样输出结果。

> db.measurement.find({"id":"180", "e":{$gte:0}}).sort({"t":-1}).explain()
{
        "cursor" : "BtreeCursor id_1_t_-1_e_1",
        "isMultiKey" : false,
        "n" : 0,
        "nscannedObjects" : 0,
        "nscanned" : 660385,
        "nscannedObjectsAllPlans" : 1981098,
        "nscannedAllPlans" : 3301849,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 7,
        "nChunkSkips" : 0,
        "millis" : 7243,
        "indexBounds" : {
                "id" : [
                        [
                                "180",
                                "180"
                        ]
                ],
                "t" : [
                        [
                                {
                                        "$maxElement" : 1
                                },
                                {
                                        "$minElement" : 1
                                }
                        ]
                ],
                "e" : [
                        [
                                0,
                                1.7976931348623157e+308
                        ]
                ]
        },
        "server" : "station:27017"
}

接下来，我查看了 measurement 集合的索引，我觉得它很好：

> db.measurement.getIndexes()
[
        {
                "v" : 1,
                "key" : {
                        "_id" : 1
                },
                "ns" : "station.measurement",
                "name" : "_id_"
        },
        {
                "v" : 1,
                "key" : {
                        "t" : 1
                },
                "ns" : "station.measurement",
                "name" : "t_1"
        },
        {
                "v" : 1,
                "key" : {
                        "id" : 1,
                        "d" : 1,
                        "_id" : -1
                },
                "ns" : "station.measurement",
                "name" : "id_1_d_1__id_-1"
        },
        {
                "v" : 1,
                "key" : {
                        "id" : 1,
                        "t" : -1,
                        "e" : 1
                },
                "ns" : "station.measurement",
                "name" : "id_1_t_-1_e_1"
        },
        {
                "v" : 1,
                "key" : {
                        "id" : 1,
                        "t" : -1,
                        "e" : -1
                },
                "ns" : "station.measurement",
                "name" : "id_1_t_-1_e_-1"
        }
]

这也是我收藏的其余信息：

> db.measurement.stats()
{
        "ns" : "station.measurement",
        "count" : 157835456,
        "size" : 22377799512,
        "avgObjSize" : 141.77929395027692,
        "storageSize" : 26476834672,
        "numExtents" : 33,
        "nindexes" : 5,
        "lastExtentSize" : 2146426864,
        "paddingFactor" : 1.0000000000028617,
        "systemFlags" : 0,
        "userFlags" : 0,
        "totalIndexSize" : 30996614096,
        "indexSizes" : {
                "_id_" : 6104250656,
                "t_1" : 3971369360,
                "id_1_d_1__id_-1" : 8397896640,
                "id_1_t_-1_e_1" : 6261548720,
                "id_1_t_-1_e_-1" : 6261548720
        },
        "ok" : 1
}

我尝试添加新索引，修复整个数据库，重新索引。我究竟做错了什么？我非常感谢任何帮助，因为我拼命地用完了想法。

更新 1：

我按照 Neil Lunn 的建议添加了两个索引，其中一些查询要快很多：

{
                "v" : 1,
                "key" : {
                        "id" : 1,
                        "e" : 1,
                        "t" : -1
                },
                "ns" : "station.measurement",
                "name" : "id_1_e_1_t_-1",
                "background" : true
        },
        {
                "v" : 1,
                "key" : {
                        "id" : 1,
                        "e" : -1,
                        "t" : -1
                },
                "ns" : "station.measurement",
                "name" : "id_1_e_-1_t_-1",
                "background" : true
        }

我得到的结果很有趣（不确定它们是否相关）

接下来的两个查询只有“id”不同。请注意，每个查询使用不同的索引，为什么？我应该删除旧的吗？

> db.measurement.find({"id":"119", "e":{$gte:0}}).sort({"t":-1}).explain()
{
        "cursor" : "BtreeCursor id_1_t_-1_e_1",
        "isMultiKey" : false,
        "n" : 840747,
        "nscannedObjects" : 840747,
        "nscanned" : 1047044,
        "nscannedObjectsAllPlans" : 1056722,
        "nscannedAllPlans" : 1311344,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 4,
        "nChunkSkips" : 0,
        "millis" : 3730,
        "indexBounds" : {
                "id" : [
                        [
                                "119",
                                "119"
                        ]
                ],
                "t" : [
                        [
                                {
                                        "$maxElement" : 1
                                },
                                {
                                        "$minElement" : 1
                                }
                        ]
                ],
                "e" : [
                        [
                                0,
                                1.7976931348623157e+308
                        ]
                ]
        },
        "server" : "station:27017"
}

> db.measurement.find({"id":"180", "e":{$gte:0}}).sort({"t":-1}).explain()
{
        "cursor" : "BtreeCursor id_1_e_1_t_-1",
        "isMultiKey" : false,
        "n" : 0,
        "nscannedObjects" : 0,
        "nscanned" : 0,
        "nscannedObjectsAllPlans" : 0,
        "nscannedAllPlans" : 45,
        "scanAndOrder" : true,
        "indexOnly" : false,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 0,
        "indexBounds" : {
                "id" : [
                        [
                                "180",
                                "180"
                        ]
                ],
                "e" : [
                        [
                                0,
                                1.7976931348623157e+308
                        ]
                ],
                "t" : [
                        [
                                {
                                        "$maxElement" : 1
                                },
                                {
                                        "$minElement" : 1
                                }
                        ]
                ]
        },
        "server" : "station:27017"
}

问题可能出在其他地方吗？什么会导致这种突然的“呆滞”？我还有其他几个集合，查询也突然变慢了。

哦，还有一件事。在我拥有的另一台服务器上，索引与添加新索引之前的索引相同。是的，集合有点小，但速度快了好几倍。

【问题讨论】：

作为一个问题格式良好。我确实似乎您正在扫描比“e”更多的“t”条目。作为一个实验，尝试更改索引和查询顺序，将有序的重点放在“t”之前的“e”。变更文件表明这不应该改变结果，但您的结果会很有趣，看看是否有差异。
有趣的建议，谢谢！我将添加新索引（id_1_e_1_t_-1 和 id_1_e_-1_t_-1），并在构建索引需要一些时间时让您知道结果。
嘿，我成功地构建了这些索引并用一些查询对其进行了测试。我用结果更新了我的问题:)
所以结果似乎有了积极的改善。你明白为什么吗？还是我们需要更进一步？
我不明白为什么要为查询选择另一个索引，其中只有“id”的值不同。我肯定在这里遗漏了一些东西......请告诉更多。我真的非常感谢您的帮助！会不会是我以某种方式损坏了我的数据库？

标签： mongodb mongodb-query

【解决方案1】：

那么这里的重点是索引和查询排序选择。

如果您查看.explain() 的早期输出，您会发现表达式中的“t”元素有一个“最小/最大”范围。通过“将其移至评估的末尾”，您可以允许对整体表达式更重要的其他过滤元素（确定“e”的不太可能的匹配项是主要因素，然后再扫描“t”基本上“一切” .

这有点 DBA，但在 NoSQL 世界中，我确实相信这会成为程序员的问题。

您基本上需要沿着选定的键构建“最短匹配路径”，以获得最有效的扫描。这就是为什么更改后的结果执行得更快的原因。

【讨论】：