【问题标题】:Upsert one or more documents, but only update if the existing document meets a conditionUpsert 一个或多个文档,但仅在现有文档满足条件时更新
【发布时间】:2019-03-26 19:04:38
【问题描述】:

在阅读了一堆 C++ 驱动程序文档和示例(例如12)后,我无法拼凑出一种使用 C++ 驱动程序实现目标的方法。

我有一组具有以下结构的文档:

{
   _id : int64_t // Supplied by me manually
   url : string
   status : int
   date : int
}

我想插入一个新文档。但是,如果具有相同_id 的文档已经存在(这意味着它的url 是相同的,因为我的_idurl 的哈希),我想更新它如下。让existing_doc 成为已经在数据库中具有相同_id 的文档,并让new_doc 成为我要提交给MongoDB 的文档:

  1. 仅当 existing_doc[status]x(某个整数常量)时才更新 existing_docdate 字段。
  2. 仅当 new_doc[status]y(某个其他常量)时才更新 existing_docstatus 字段。

如果可以进行批量操作(一堆不同的new_docs),将获得奖励积分,但任何有关如何实现此逻辑的提示将不胜感激。

【问题讨论】:

    标签: mongodb mongodb-query mongo-cxx-driver


    【解决方案1】:

    我找不到使用 MongoDB 的 field update operatorsother answers 的简单方法。

    但一种可能性是使用两个操作执行 bulk_write:

    1. existing_doc[status]x 时更新existing_doc[date] 或插入新文档(条件1)的一个upsert 操作
    2. new_doc[status]y(条件2)时,将执行一个更新existing_doc[status] 的更新操作

    对于条件1,我们可以通过idstatus=x 执行单个upsert operation 查询:

    1. 如果检索到文档,date 会更新($set 运算符) 因为这意味着existing_doc[status]x
    2. 如果没有检索到文档,则会尝试插入($setOnInsert 运算符),可能会发生两种情况:
      • 文档被正确插入 -> 这是一个新文档
      • 抛出重复键错误 -> 包含 id 的文档已存在,但其 status 不是 x,因此无需更新

    条件 2 更简单,因为输入数据已经告诉我们是否需要执行更新或是否可以跳过。

    以下代码是使用bulk_write来执行上述操作:

    #include <iostream>
    
    #include <bsoncxx/builder/stream/document.hpp>
    #include <bsoncxx/json.hpp>
    
    #include <mongocxx/client.hpp>
    #include <mongocxx/exception/bulk_write_exception.hpp>
    #include <mongocxx/instance.hpp>
    #include <mongocxx/uri.hpp>
    
    using bsoncxx::builder::basic::kvp;
    using bsoncxx::builder::basic::make_document;
    using bsoncxx::builder::stream::document;
    using bsoncxx::builder::stream::finalize;
    
    // Status constants
    int kStatus_X = 1234;
    int kStatus_Y = 6789;
    
    // Helper method to retrieve the document (in json format) if exists
    std::string retrieveJsonDocById(mongocxx::collection& coll, const std::string& id)
    {
        bsoncxx::stdx::optional<bsoncxx::document::value> maybe_result =
            coll.find_one(document{} << "_id" << id << finalize);
    
        if (maybe_result)   { return bsoncxx::to_json(*maybe_result); }
        else                { return "Nothing retrieved for id: " + id; }
    }
    
    
    // Inserts a new document {id,url,status,date}, or updates the existing one
    void upsertUrl(mongocxx::collection& coll,
                   std::string id, std::string url, int status, int date)
    {
        std::cout << ">> Before insert/update: " << retrieveJsonDocById(coll, id) << std::endl;
    
        // Bulk write ordered=false to force performing all operations
        mongocxx::options::bulk_write bulkWriteOption;
        bulkWriteOption.ordered(false);
        auto bulk = coll.create_bulk_write(bulkWriteOption);
    
        // If document exists and has status='x', update the date field
        // If document exists but status!='x', nothing will be inserted (duplicate key thrown)
        // If document is new, perform insert
        mongocxx::model::update_one upsert_op{
            make_document(kvp("_id", id), kvp("status", kStatus_X)),
            make_document(
                    kvp("$set", make_document(kvp("date", date))),
                    kvp("$setOnInsert", make_document(kvp("status", status), kvp("url", url))))
        };
        upsert_op.upsert(true);
        bulk.append(upsert_op);
    
        // If new_doc[status] is 'y', attempt to perform status update
        if (status == kStatus_Y) {
            mongocxx::model::update_one update_op{
                make_document(kvp("_id", id)),
                make_document(kvp("$set", make_document(kvp("status", status))))
            };
            bulk.append(update_op);
        }
    
        try {
            auto result = bulk.execute();
        }
        catch (const mongocxx::bulk_write_exception& e) {
            if (e.code().value() == 11000) {
                std::cout << "Duplicate key error expected when id exists but the status!=x: ";
                std::cout << std::endl << e.what() << std::endl;
            }
        }
        std::cout << ">> After insert/update:  " << retrieveJsonDocById(coll, id) << std::endl << std::endl;
    }
    

    这些测试场景:

    int main(int, char**) {
    
        std::cout << "Starting program, x=" << kStatus_X << ", y=" << kStatus_Y << std::endl;
    
        mongocxx::instance instance{};
        mongocxx::client client{ mongocxx::uri{} };
    
        mongocxx::database db = client["stack"];
        mongocxx::collection coll = db["urls"];
    
        std::cout << "Inserting Doc #1 (status=x):" << std::endl;
        upsertUrl(coll, "1", "1_url.com", kStatus_X, 101010);
        std::cout << "Inserting Doc #2 (status=x):" << std::endl;
        upsertUrl(coll, "2", "2_url.com", kStatus_X, 202020);
        std::cout << "Inserting Doc #3 (status!=x):" << std::endl;
        upsertUrl(coll, "3", "3_url.com", 3, 303030);
        std::cout << "Inserting Doc #4 (status!=x):" << std::endl;
        upsertUrl(coll, "4", "4_url.com", 4, 404040);
    
        std::cout << "Inserting again Doc #1 (existing.status=x, new.status=y) -> should update the date and status:" << std::endl;
        upsertUrl(coll, "1", "1_url.com", kStatus_Y, 505050);
        std::cout << "Inserting again Doc #2 (existing.status=x, new.status!=y) -> should update date:" << std::endl;
        upsertUrl(coll, "2", "2_url.com", 6, 606060);
        std::cout << "Inserting again Doc #3 (existing.status!=x, new.status=y) -> should update status:" << std::endl;
        upsertUrl(coll, "3", "3_url.com", kStatus_Y, 707070);
        std::cout << "Inserting again Doc #4 (existing.status!=x, new.status!=y) -> should update nothing:" << std::endl;
        upsertUrl(coll, "4", "4_url.com", 8, 808080);
    
        std::cout << "End program" << std::endl;
    }
    

    生成以下输出:

    Starting program, x=1234, y=6789
    Inserting Doc #1 (status=x):
    >> Before insert/update: Nothing retrieved for id: 1
    >> After insert/update:  { "_id" : "1", "status" : 1234, "date" : 101010, "url" : "1_url.com" }
    
    Inserting Doc #2 (status=x):
    >> Before insert/update: Nothing retrieved for id: 2
    >> After insert/update:  { "_id" : "2", "status" : 1234, "date" : 202020, "url" : "2_url.com" }
    
    Inserting Doc #3 (status!=x):
    >> Before insert/update: Nothing retrieved for id: 3
    >> After insert/update:  { "_id" : "3", "status" : 3, "date" : 303030, "url" : "3_url.com" }
    
    Inserting Doc #4 (status!=x):
    >> Before insert/update: Nothing retrieved for id: 4
    >> After insert/update:  { "_id" : "4", "status" : 4, "date" : 404040, "url" : "4_url.com" }
    
    Inserting again Doc #1 (existing.status=x, new.status=y) -> should update the date and status:
    >> Before insert/update: { "_id" : "1", "status" : 1234, "date" : 101010, "url" : "1_url.com" }
    >> After insert/update:  { "_id" : "1", "status" : 6789, "date" : 505050, "url" : "1_url.com" }
    
    Inserting again Doc #2 (existing.status=x, new.status!=y) -> should update date:
    >> Before insert/update: { "_id" : "2", "status" : 1234, "date" : 202020, "url" : "2_url.com" }
    >> After insert/update:  { "_id" : "2", "status" : 1234, "date" : 606060, "url" : "2_url.com" }
    
    Inserting again Doc #3 (existing.status!=x, new.status=y) -> should update status:
    >> Before insert/update: { "_id" : "3", "status" : 3, "date" : 303030, "url" : "3_url.com" }
    Duplicate key error expected when id exists but the status!=x:
    E11000 duplicate key error collection: stack.urls index: _id_ dup key: { : "3" }: generic server error
    >> After insert/update:  { "_id" : "3", "status" : 6789, "date" : 303030, "url" : "3_url.com" }
    
    Inserting again Doc #4 (existing.status!=x, new.status!=y) -> should update nothing:
    >> Before insert/update: { "_id" : "4", "status" : 4, "date" : 404040, "url" : "4_url.com" }
    Duplicate key error expected when id exists but the status!=x:
    E11000 duplicate key error collection: stack.urls index: _id_ dup key: { : "4" }: generic server error
    >> After insert/update:  { "_id" : "4", "status" : 4, "date" : 404040, "url" : "4_url.com" }
    
    End program
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2018-07-06
      • 2020-11-08
      • 1970-01-01
      • 1970-01-01
      • 2013-04-05
      • 2015-10-03
      相关资源
      最近更新 更多