【问题标题】:how to index my collection to use a compound multikey index如何索引我的集合以使用复合多键索引
【发布时间】:2012-10-08 13:02:08
【问题描述】:

这是我要查询的文档:

{
"_id":ObjectId("5062d30522dfae0e11000000"),
"id_resource" : "147",
"moment_created" : ISODate("2012-03-22T16:29:21Z"),
"moment_updated" : ISODate("2012-03-22T16:29:21Z"),
"users_involved" : [
    {
        "id_user" : "113928869",
        "state" : "answered",
        "id_folder" : "0",
        "is_deleted" : "0"
    },
    {
        "id_user" : "121624627",
        "state" : "new",
        "id_folder" : "0",
        "is_deleted" : "0" }
],
"posts" : [
    {
        "id_author" : "113928869",
        "post" : "hiohhio",
        "moment_created" : ISODate("2012-03-22T16:29:21Z")
    }
    ]
}

这就是我试图确保我的索引的方式:

db.message.ensureIndex({id_resource:1, users_involved : 1});

这是我用来查询我的收藏的查询:

db.message.find({id_resource : "143", "users_involved" : {$elemMatch : {id_user : "101226353", state : "answered"}}});

但稍后解释我得到这个输出:

{
    "clusteredType" : "ParallelSort",
    "cursor" : "BasicCursor",
    "n" : 11,
    "nChunkSkips" : 0,
    "nYields" : 8624,
    "nscanned" : 1461277,
    "nscannedAllPlans" : 1461277,
    "nscannedObjects" : 1461277,
    "nscannedObjectsAllPlans" : 1461277,
    "millisShardTotal" : 1878,
    "millisShardAvg" : 939,
    "numQueries" : 2,
    "numShards" : 2,
    "millis" : 1646

}

getIndexes 将返回:

[
    {
            "v" : 1,
            "key" : {
                    "_id" : 1
            },
            "ns" : "messaging.message",
            "name" : "_id_"
    },
    {
            "v" : 1,
            "key" : {
                    "id_resource" : 1,
                    "users_involved" : 1
            },
            "ns" : "messaging.message",
            "name" : "id_resource_1_users_involved_1"
    }

]

遗憾的是,我不明白为什么我的查询没有使用索引 id_resource_1_users_involved_1。谁能向我解释为什么我的索引没有被使用,或者我必须如何构建我的索引来支持我想要使用的查询?

感谢时间和帮助

更新

真不好意思,我的错字。所以这里是查询的实际解释

{
    "clusteredType" : "ParallelSort",
    "cursor" : "BtreeCursor id_resource_1_users_involved_1",
    "n" : 5,
    "nChunkSkips" : 0,
    "nYields" : 2,
    "nscanned" : 46868,
    "nscannedAllPlans" : 93736,
    "nscannedObjects" : 46868,
    "nscannedObjectsAllPlans" : 93736,
    "millisShardTotal" : 281,
    "millisShardAvg" : 140,
    "numQueries" : 2,
    "numShards" : 2,
    "millis" : 220

}

所以查询正在使用我的索引,但它仍然很慢,nscanned 也很大,所以没有使用整个索引?我将不得不检查 nscanned 是否与资源 x 的消息数量匹配

使用来自 JohnnyHK 的复合索引,它变得更快:

ensureIndex({id_resource:1, 'users_involved.id_user':1, 'users_involved.state':1});

解释

{
    "clusteredType" : "ParallelSort",
    "cursor" : "BtreeCursor id_resource_1_users_involved.id_user_1_users_involved.state_1",
    "n" : 5,
    "nChunkSkips" : 0,
    "nYields" : 0,
    "nscanned" : 7,
    "nscannedAllPlans" : 7,
    "nscannedObjects" : 7,
    "nscannedObjectsAllPlans" : 7,
    "millisShardTotal" : 0,
    "millisShardAvg" : 0,
    "numQueries" : 2,
    "numShards" : 2,
    "millis" : 1
}

所以如果我想查询 users_involved 数组,我必须为每个查询建立一个单独的索引?

还 @JohnnyHK 使用提到的整个数组:

find({id_resource : "197", "users_involved" : {$elemMatch : {id_user : "128825371", state : "answered", id_folder:"0", is_deleted:"0"}}}).hint("id_resource_1_users_involved_1")

没有任何改善,请解释:

{
    "clusteredType" : "ParallelSort",
    "cursor" : "BtreeCursor id_resource_1_users_involved_1",
    "n" : 5,
    "nChunkSkips" : 0,
    "nYields" : 1,
    "nscanned" : 46868,
    "nscannedAllPlans" : 46868,
    "nscannedObjects" : 46868,
    "nscannedObjectsAllPlans" : 46868,
    "millisShardTotal" : 222,
    "millisShardAvg" : 111,
    "numQueries" : 2,
    "numShards" : 2,
    "millis" : 174

}

或者我还是做错了?

*我还从解释响应中删除了分片信息,如果此信息可能很重要,请直说

【问题讨论】:

    标签: mongodb


    【解决方案1】:

    因为您的复合索引包含整个 users_involved 数组,所以该索引只能在匹配数组的完整嵌入文档元素时使用。见here

    我认为使用仅包含来自users_involved 的您打算搜索的字段的复合索引会更好地为您服务。所以要么:

    db.message.ensureIndex({id_resource:1, 'users_involved.id_user' : 1});
    

    db.message.ensureIndex({id_resource:1, 'users_involved.id_user' : 1, 'users_involved.state' : 1});
    

    【讨论】:

    • @braunbaer 您不必为每个查询建立单独的索引,但您需要确保您定义的索引提供足够的选择性,以便只需要扫描少量文档任何给定的查询。请参阅复合键上的the docs
    • 很好,每个新案例(查询)都将是一个新索引。 case "resource x player y folder z" -> index resource_player_folder case "resource x player y state z" -> index resource_player_state case "resource x player x folder z state xz" -> index resource_player_state_folder 这个想法是在“ users_invovled”,而对于几乎相同的查询没有这些单独的索引。但也许我只需要多次阅读文档才能真正理解它们。感谢您的时间和帮助。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2020-03-23
    • 2014-03-23
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2016-10-19
    相关资源
    最近更新 更多