获取 Bulk.Insert() -Mongoskin 的插入 ID答案

【问题标题】：Get inserted Ids for Bulk.Insert() -Mongoskin获取 Bulk.Insert() -Mongoskin 的插入 ID
【发布时间】：2016-10-29 02:51:13
【问题描述】：

我在我的 nodeJs 应用程序中使用 mongoskin 在 mongo db 中插入数据。我需要在数据库中插入文档数组并将插入记录的 ID 发送回客户端。我能够插入数据，但无法在结果对象中找到插入记录的 ID。需要帮助来定位结果中的insertedId。我使用下面的代码批量插入。

db.collection('myCollection', function (err, collection) {
    var bulk = collection.initializeUnorderedBulkOp();
    for (var i = 0; i < dataArray.length; i++) {
        bulk.insert(dataArray[i]);
    }

    bulk.execute(function (err, result) {
      //TODO: return the Ids of inserted records to the client
      //Client will use these Ids to perform subsequent calls to the nodejs service
    });
});

我的结果是 BatchWriteResult 对象类型。

【问题讨论】：

标签： mongodb bulkinsert mongoskin

【解决方案1】：

建议使用其他批量 API 方法 upsert()，它可以让您在 BatchWriteResult() 对象中通过调用它的 getUpsertedIds() 方法。结果对象的格式与 BulkWriteResult 的文档中给出的格式相同。

当没有符合 Bulk.find() 条件的文档时，带有 Bulk.find.upsert() 选项的更新操作将执行插入操作。如果更新文档没有指定 _id 字段，MongoDB 会添加 _id 字段，因此您可以检索插入文档的 id 在您的 BatchWriteResult() 内。

此外，通常不推荐您排队批量插入操作的方式，因为这基本上是在内存中建立的；除了依赖驱动程序的default way of limiting the batches of 1000 at a time 以及整个批处理小于 16MB 之外，您还希望对管理队列和内存资源有更多的控制。这样做的方法是使用数据数组的 forEach() 循环和一个计数器，这将有助于将批次一次限制为 1000 个。

下面展示了上面的做法

function getInsertedIds(result){
    var ids = result.getUpsertedIds();
    console.log(ids); // an array of upserted ids
    return ids;
}

db.collection('myCollection',function(err,collection) {
    var bulk = collection.initializeUnorderedBulkOp(),
        insertedIds = [],
        counter = 0;

    dataArray.forEach(function (data){
        bulk.find(data).upsert().updateOne(data);
        counter++;

        if (counter % 1000 == 0) {
            bulk.execute(function(err, result) {
               insertedIds = getInsertedIds(result);
               bulk = collection.initializeUnorderedBulkOp(); // reset after execute
            });      
        }
    });

    // Clean up the remaining operations in the queue which were 
    // cut off in the loop - counter not a round divisor of 1000
    if (counter % 1000 != 0 ) {
        bulk.execute(function(err, result) {
            insertedIds = insertedIds.concat(getInsertedIds(result));
            console.log(insertedIds);
        });
    }
});

【讨论】：

这意味着 upsert 将在新创建的文档中返回 id 而不是插入？ - 如果是，那非常聪明。我喜欢！
如果使用最新的 Node.js 驱动程序，那么仅使用 Bulk.insert() 方法就可以使用 getInsertedIds() 方法，但是因为它返回 BatchWriteResult，所以我知道的解决方法是通过upsert() 方式并使用getUpsertedIds() 方法。
@chridam ：感谢您的解释和代码。这有效，但仅适用于第一次运行。如果我想再次插入同一个文档，文档会更新而不是插入。我的要求是无论数据是否存在，每次都应该插入 dataArray。我试图强制插入 1。bulk.find({}).upsert().updateOne(data); - 它起作用了。 2.bulk.find({food:'bar'}).upsert().updateOne(data) - 我传递了我的收藏中永远不会出现的对象。这是非常缓慢的。数组中有 20000 条记录。
调用插入记录的api时，我的服务的工作是简单地插入请求中发送的数组。如果同一个数组传了两次，服务应该盲目地再次插入数组。例如：假设客户端使用 10 条记录的数组调用插入 api。我插入它。一段时间后，客户端再次调用具有相同 10 条记录的 api（offcouse 此处没有 _id 字段）。我应该重新插入它。在 2 次执行结束时，应该有 20 条记录。您提供的代码在执行 2 次后仅产生 10 条记录。
@chridam 我们是否也可以获取更新文档的 ID 而不是仅插入的文档？我正在使用 pymongo。