【问题标题】：how to index my collection to use a compound multikey index如何索引我的集合以使用复合多键索引
【发布时间】：2012-10-08 13:02:08
【问题描述】：

这是我要查询的文档：

{
"_id":ObjectId("5062d30522dfae0e11000000"),
"id_resource" : "147",
"moment_created" : ISODate("2012-03-22T16:29:21Z"),
"moment_updated" : ISODate("2012-03-22T16:29:21Z"),
"users_involved" : [
    {
        "id_user" : "113928869",
        "state" : "answered",
        "id_folder" : "0",
        "is_deleted" : "0"
    },
    {
        "id_user" : "121624627",
        "state" : "new",
        "id_folder" : "0",
        "is_deleted" : "0" }
],
"posts" : [
    {
        "id_author" : "113928869",
        "post" : "hiohhio",
        "moment_created" : ISODate("2012-03-22T16:29:21Z")
    }
    ]
}

这就是我试图确保我的索引的方式：

db.message.ensureIndex({id_resource:1, users_involved : 1});

这是我用来查询我的收藏的查询：

db.message.find({id_resource : "143", "users_involved" : {$elemMatch : {id_user : "101226353", state : "answered"}}});

但稍后解释我得到这个输出：

{
    "clusteredType" : "ParallelSort",
    "cursor" : "BasicCursor",
    "n" : 11,
    "nChunkSkips" : 0,
    "nYields" : 8624,
    "nscanned" : 1461277,
    "nscannedAllPlans" : 1461277,
    "nscannedObjects" : 1461277,
    "nscannedObjectsAllPlans" : 1461277,
    "millisShardTotal" : 1878,
    "millisShardAvg" : 939,
    "numQueries" : 2,
    "numShards" : 2,
    "millis" : 1646

}

getIndexes 将返回：

[
    {
            "v" : 1,
            "key" : {
                    "_id" : 1
            },
            "ns" : "messaging.message",
            "name" : "_id_"
    },
    {
            "v" : 1,
            "key" : {
                    "id_resource" : 1,
                    "users_involved" : 1
            },
            "ns" : "messaging.message",
            "name" : "id_resource_1_users_involved_1"
    }

]

遗憾的是，我不明白为什么我的查询没有使用索引 id_resource_1_users_involved_1。谁能向我解释为什么我的索引没有被使用，或者我必须如何构建我的索引来支持我想要使用的查询？

感谢时间和帮助

更新

真不好意思，我的错字。所以这里是查询的实际解释

{
    "clusteredType" : "ParallelSort",
    "cursor" : "BtreeCursor id_resource_1_users_involved_1",
    "n" : 5,
    "nChunkSkips" : 0,
    "nYields" : 2,
    "nscanned" : 46868,
    "nscannedAllPlans" : 93736,
    "nscannedObjects" : 46868,
    "nscannedObjectsAllPlans" : 93736,
    "millisShardTotal" : 281,
    "millisShardAvg" : 140,
    "numQueries" : 2,
    "numShards" : 2,
    "millis" : 220

}

所以查询正在使用我的索引，但它仍然很慢，nscanned 也很大，所以没有使用整个索引？我将不得不检查 nscanned 是否与资源 x 的消息数量匹配

使用来自 JohnnyHK 的复合索引，它变得更快：

ensureIndex({id_resource:1, 'users_involved.id_user':1, 'users_involved.state':1});

解释

{
    "clusteredType" : "ParallelSort",
    "cursor" : "BtreeCursor id_resource_1_users_involved.id_user_1_users_involved.state_1",
    "n" : 5,
    "nChunkSkips" : 0,
    "nYields" : 0,
    "nscanned" : 7,
    "nscannedAllPlans" : 7,
    "nscannedObjects" : 7,
    "nscannedObjectsAllPlans" : 7,
    "millisShardTotal" : 0,
    "millisShardAvg" : 0,
    "numQueries" : 2,
    "numShards" : 2,
    "millis" : 1
}

所以如果我想查询 users_involved 数组，我必须为每个查询建立一个单独的索引？

还 @JohnnyHK 使用提到的整个数组：

find({id_resource : "197", "users_involved" : {$elemMatch : {id_user : "128825371", state : "answered", id_folder:"0", is_deleted:"0"}}}).hint("id_resource_1_users_involved_1")

没有任何改善，请解释：

{
    "clusteredType" : "ParallelSort",
    "cursor" : "BtreeCursor id_resource_1_users_involved_1",
    "n" : 5,
    "nChunkSkips" : 0,
    "nYields" : 1,
    "nscanned" : 46868,
    "nscannedAllPlans" : 46868,
    "nscannedObjects" : 46868,
    "nscannedObjectsAllPlans" : 46868,
    "millisShardTotal" : 222,
    "millisShardAvg" : 111,
    "numQueries" : 2,
    "numShards" : 2,
    "millis" : 174

}

或者我还是做错了？

*我还从解释响应中删除了分片信息，如果此信息可能很重要，请直说

【问题讨论】：

标签： mongodb

【解决方案1】：

因为您的复合索引包含整个 users_involved 数组，所以该索引只能在匹配数组的完整嵌入文档元素时使用。见here。

我认为使用仅包含来自users_involved 的您打算搜索的字段的复合索引会更好地为您服务。所以要么：

db.message.ensureIndex({id_resource:1, 'users_involved.id_user' : 1});

或

db.message.ensureIndex({id_resource:1, 'users_involved.id_user' : 1, 'users_involved.state' : 1});

【讨论】：

@braunbaer 您不必为每个查询建立单独的索引，但您需要确保您定义的索引提供足够的选择性，以便只需要扫描少量文档任何给定的查询。请参阅复合键上的the docs。
很好，每个新案例（查询）都将是一个新索引。 case "resource x player y folder z" -> index resource_player_folder case "resource x player y state z" -> index resource_player_state case "resource x player x folder z state xz" -> index resource_player_state_folder 这个想法是在“ users_invovled”，而对于几乎相同的查询没有这些单独的索引。但也许我只需要多次阅读文档才能真正理解它们。感谢您的时间和帮助。