【问题标题】:Slow query behaviour using $exists with mongodb on fields with an index在具有索引的字段上使用 $exists 和 mongodb 的慢查询行为
【发布时间】:2017-02-21 21:28:45
【问题描述】:

我一直在使用 mongo 3.2.9 安装进行一些实时数据调查。主要的症结是找出文档中缺少数据的记录的一些细节。但是我正在运行的查询在 robomongo 和 compass 中超时。

我有一个包含超过 300 万条记录的集合 (foo)。我正在搜索所有没有 barId 的记录,这是我在 mongo 触发的查询:

db.foo.find({barId:{$exists:true}}).explain(true)

在 mongo shell 中,这是执行计划(在 robomongo 或 compass 中超时)

MongoDB Enterprise > db.foo.find({barId:{$exists:true}}).explain(true)
{
  "queryPlanner" : {
    "plannerVersion" : 1,
    "namespace" : "myDatabase01.foo",
    "indexFilterSet" : false,
    "parsedQuery" : {
      "barId" : {
        "$exists" : true
      }
    },
    "winningPlan" : {
      "stage" : "FETCH",
      "filter" : {
        "barId" : {
          "$exists" : true
        }
      },
      "inputStage" : {
        "stage" : "IXSCAN",
        "keyPattern" : {
          "barId" : 1
        },
        "indexName" : "barId_1",
        "isMultiKey" : false,
        "isUnique" : false,
        "isSparse" : false,
        "isPartial" : false,
        "indexVersion" : 1,
        "direction" : "forward",
        "indexBounds" : {
          "barId" : [
            "[MinKey, MaxKey]"
          ]
        }
      }
    },
    "rejectedPlans" : [ ]
  },
  "executionStats" : {
    "executionSuccess" : true,
    "nReturned" : 2,
    "executionTimeMillis" : 154716,
    "totalKeysExamined" : 3361040,
    "totalDocsExamined" : 3361040,
    "executionStages" : {
      "stage" : "FETCH",
      "filter" : {
        "barId" : {
          "$exists" : true
        }
      },
      "nReturned" : 2,
      "executionTimeMillisEstimate" : 152060,
      "works" : 3361041,
      "advanced" : 2,
      "needTime" : 3361038,
      "needYield" : 0,
      "saveState" : 27619,
      "restoreState" : 27619,
      "isEOF" : 1,
      "invalidates" : 0,
      "docsExamined" : 3361040,
      "alreadyHasObj" : 0,
      "inputStage" : {
        "stage" : "IXSCAN",
        "nReturned" : 3361040,
        "executionTimeMillisEstimate" : 1260,
        "works" : 3361041,
        "advanced" : 3361040,
        "needTime" : 0,
        "needYield" : 0,
        "saveState" : 27619,
        "restoreState" : 27619,
        "isEOF" : 1,
        "invalidates" : 0,
        "keyPattern" : {
          "barId" : 1
        },
        "indexName" : "barId_1",
        "isMultiKey" : false,
        "isUnique" : false,
        "isSparse" : false,
        "isPartial" : false,
        "indexVersion" : 1,
        "direction" : "forward",
        "indexBounds" : {
          "barId" : [
            "[MinKey, MaxKey]"
          ]
        },
        "keysExamined" : 3361040,
        "dupsTested" : 0,
        "dupsDropped" : 0,
        "seenInvalidated" : 0
      }
    },
    "allPlansExecution" : [ ]
  },
  "serverInfo" : {
    "host" : "myLinuxMachine",
    "port" : 8080,
    "version" : "3.2.9",
    "gitVersion" : "22ec9e93b40c85fc7cae7d56e7d6a02fd811088c"
  },
  "ok" : 1
}

看起来它使用了我的 barId_1 索引,但同时它扫描所有 300 万条记录只返回 2。

我运行了一个类似的查询,但不是寻找字段的存在,而是寻找大于 0 的 id(全部)

MongoDB Enterprise > db.foo.find({barId:{$gt:"0"}}).explain(true)
{
  "queryPlanner" : {
    "plannerVersion" : 1,
    "namespace" : "myDatabase01.foo",
    "indexFilterSet" : false,
    "parsedQuery" : {
      "barId" : {
        "$gt" : "0"
      }
    },
    "winningPlan" : {
      "stage" : "FETCH",
      "inputStage" : {
        "stage" : "IXSCAN",
        "keyPattern" : {
          "barId" : 1
        },
        "indexName" : "barId_1",
        "isMultiKey" : false,
        "isUnique" : false,
        "isSparse" : false,
        "isPartial" : false,
        "indexVersion" : 1,
        "direction" : "forward",
        "indexBounds" : {
          "barId" : [
            "(\"0\", {})"
          ]
        }
      }
    },
    "rejectedPlans" : [ ]
  },
  "executionStats" : {
    "executionSuccess" : true,
    "nReturned" : 2,
    "executionTimeMillis" : 54,
    "totalKeysExamined" : 2,
    "totalDocsExamined" : 2,
    "executionStages" : {
      "stage" : "FETCH",
      "nReturned" : 2,
      "executionTimeMillisEstimate" : 10,
      "works" : 3,
      "advanced" : 2,
      "needTime" : 0,
      "needYield" : 0,
      "saveState" : 0,
      "restoreState" : 0,
      "isEOF" : 1,
      "invalidates" : 0,
      "docsExamined" : 2,
      "alreadyHasObj" : 0,
      "inputStage" : {
        "stage" : "IXSCAN",
        "nReturned" : 2,
        "executionTimeMillisEstimate" : 10,
        "works" : 3,
        "advanced" : 2,
        "needTime" : 0,
        "needYield" : 0,
        "saveState" : 0,
        "restoreState" : 0,
        "isEOF" : 1,
        "invalidates" : 0,
        "keyPattern" : {
          "barId" : 1
        },
        "indexName" : "barId_1",
        "isMultiKey" : false,
        "isUnique" : false,
        "isSparse" : false,
        "isPartial" : false,
        "indexVersion" : 1,
        "direction" : "forward",
        "indexBounds" : {
          "barId" : [
            "(\"1\", {})"
          ]
        },
        "keysExamined" : 2,
        "dupsTested" : 0,
        "dupsDropped" : 0,
        "seenInvalidated" : 0
      }
    },
    "allPlansExecution" : [ ]
  },
  "serverInfo" : {
    "host" : "myLinuxMachine",
    "port" : 8080,
    "version" : "3.2.9",
    "gitVersion" : "22ec9e93b40c85fc7cae7d56e7d6a02fd811088c"
  },
  "ok" : 1
}

这再次对 barId_1 进行了索引扫描。它扫描了 2 条记录,返回 2 条。

为了完整起见,这里是 2 条记录,其他 300 万条在大小和组成上非常相似。

MongoDB Enterprise > db.foo.find({barId:{$gt:"0"}})
{ 
  "_id" : "00002f5d-ee4a-4996-bb27-b54ea84df777", "createdDate" : ISODate("2016-11-16T02:26:48.500Z"), "createdBy" : "Exporter", "lastModifiedDate" : ISODate("2016-11-16T02:26:48.500Z"), "lastModifiedBy" : "Exporter", "rolePlayed" : "LA", "roleType" : "T", "oId" : [ "d7316944-62ed-48dc-8ee4-e3bad8c58b10" ], "barId" : "e45b3160-bbb4-24e5-82b3-ad0c28329555", "cId" : "dcc29053-7a1f-439e-9536-fb4e44ff8a51", "timestamp" : "2017-02-20T16:23:15.795Z" 
}
{ 
  "_id" : "00002f5d-ee4a-4996-bb27-b54ea84df888", "createdDate" : ISODate("2016-11-16T02:26:48.500Z"), "createdBy" : "Exporter", "lastModifiedDate" : ISODate("2016-11-16T02:26:48.500Z"), "lastModifiedBy" : "Exporter", "rolePlayed" : "LA", "roleType" : "T", "oId" : [ "d7316944-62ed-48dc-8ee4-e3bad8c58b10" ], "barId" : "e45b3160-bbb4-24e5-82b3-ad0c28329555", "cId" : "dcc29053-7a1f-439e-9536-fb4e44ff8a51", "timestamp" : "2017-02-20T16:23:15.795Z" 
}

当然,我进行了一些谷歌搜索,发现使用索引和exists子句曾经存在问题,但在我读过的许多线程中,这已得到修复。是吗?此外,我还发现了以下 Hack,您可以使用它而不是 $exists 子句在查找字段是否存在时强制“正确”使用索引。

MongoDB Enterprise > db.foo.find({barId:{$ne:null}}).explain(true)
{
  "queryPlanner" : {
    "plannerVersion" : 1,
    "namespace" : "myDatabase01.foo",
    "indexFilterSet" : false,
    "parsedQuery" : {
      "$not" : {
        "barId" : {
          "$eq" : null
        }
      }
    },
    "winningPlan" : {
      "stage" : "FETCH",
      "filter" : {
        "$not" : {
          "barId" : {
            "$eq" : null
          }
        }
      },
      "inputStage" : {
        "stage" : "IXSCAN",
        "keyPattern" : {
          "barId" : 1
        },
        "indexName" : "barId_1",
        "isMultiKey" : false,
        "isUnique" : false,
        "isSparse" : false,
        "isPartial" : false,
        "indexVersion" : 1,
        "direction" : "forward",
        "indexBounds" : {
          "barId" : [
            "[MinKey, null)",
            "(null, MaxKey]"
          ]
        }
      }
    },
    "rejectedPlans" : [ ]
  },
  "executionStats" : {
    "executionSuccess" : true,
    "nReturned" : 2,
    "executionTimeMillis" : 57,
    "totalKeysExamined" : 3,
    "totalDocsExamined" : 2,
    "executionStages" : {
      "stage" : "FETCH",
      "filter" : {
        "$not" : {
          "barId" : {
            "$eq" : null
          }
        }
      },
      "nReturned" : 2,
      "executionTimeMillisEstimate" : 10,
      "works" : 4,
      "advanced" : 2,
      "needTime" : 1,
      "needYield" : 0,
      "saveState" : 0,
      "restoreState" : 0,
      "isEOF" : 1,
      "invalidates" : 0,
      "docsExamined" : 2,
      "alreadyHasObj" : 0,
      "inputStage" : {
        "stage" : "IXSCAN",
        "nReturned" : 2,
        "executionTimeMillisEstimate" : 10,
        "works" : 4,
        "advanced" : 2,
        "needTime" : 1,
        "needYield" : 0,
        "saveState" : 0,
        "restoreState" : 0,
        "isEOF" : 1,
        "invalidates" : 0,
        "keyPattern" : {
          "barId" : 1
        },
        "indexName" : "barId_1",
        "isMultiKey" : false,
        "isUnique" : false,
        "isSparse" : false,
        "isPartial" : false,
        "indexVersion" : 1,
        "direction" : "forward",
        "indexBounds" : {
          "barId" : [
            "[MinKey, null)",
            "(null, MaxKey]"
          ]
        },
        "keysExamined" : 3,
        "dupsTested" : 0,
        "dupsDropped" : 0,
        "seenInvalidated" : 0
      }
    },
    "allPlansExecution" : [ ]
  },
  "serverInfo" : {
    "host" : "myLinuxMachine",
    "port" : 8080,
    "version" : "3.2.9",
    "gitVersion" : "22ec9e93b40c85fc7cae7d56e7d6a02fd811088c"
  },
  "ok" : 1
}

这行得通,只扫描了 2 个文档,只返回了 2 个文档。

我的问题是这样的。 我应该在查询中使用 $exists 吗?它是否适合在现场制作应用程序中使用?如果答案是否定的,为什么 $exist 子句甚至首先存在?

总有可能是它的 mongo 安装有问题,或者索引可能在某种程度上是错误的。任何灯光都会非常受欢迎,但现在我坚持使用 $ne:null hack。

【问题讨论】:

    标签: mongodb


    【解决方案1】:

    您应该为barId 字段使用partial index(首选)或稀疏索引:

    db.foo.createIndex(
       { barId: 1 },
       { partialFilterExpression: { barId: { $exists: true } } }
    )
    

    【讨论】:

    • 感谢这一切都很好。添加建议的索引已将执行 barId:{$exists:true} 查询的时间缩短了 10 倍。我真的很关心索引差异的原因。为什么不创建这样的所有索引?
    • @Damo,根据 mongo 文档,稀疏索引会导致排序问题。因此,除非您明确提示驱动程序使用该索引,否则将不会使用稀疏索引进行排序操作。
    • 如果存在过滤器(和)中有多个字段,会怎样?是否应该为每个字段创建索引?
    猜你喜欢
    • 2016-07-10
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2015-05-19
    • 1970-01-01
    • 2020-10-11
    • 1970-01-01
    相关资源
    最近更新 更多