【问题标题】:MongoDB avoid duplicates using $addToSet in aggregation pipelineMongoDB 在聚合管道中使用 $addToSet 避免重复
【发布时间】:2017-01-31 05:31:59
【问题描述】:

有聚合管道:

db.getCollection('yourCollection').aggregate(
    {
        $unwind: {
            path: "$dates",
            includeArrayIndex: "idx"
        }
    },
    {
        $project: {
            _id: 0,
            dates: 1,
            numbers: { $arrayElemAt: ["$numbers", "$idx"] },
            goals: { $arrayElemAt: ["$goals", "$idx"] },
            durations: { $arrayElemAt: ["$durations", "$idx"] }
        }
    }
)

对以下数据(示例文档)执行:

{
    "_id" : ObjectId("52d017d4b60fb046cdaf4851"),
    "dates" : [
        1399518702000,
        1399126333000,
        1399209192000,
        1399027545000
    ],
    "dress_number" : "4",
    "name" : "J. Evans",
    "numbers" : [
        "5982",
        "5983",
        "5984",
        "5985"
    ],
    "goals": [
        "1",
        "0",
        "4",
        "2"
    ],
   "durations": [
       "78",
       "45",
       "90",
       "90"
   ]
}

{
    "_id" : ObjectId("57e250c1b60fb0213d06737c"),
    "dates" : [
        "1399027545000",
        "1399101432000",
        "1399026850000",
        "1399904504000"
    ],
    "dress_number" : "6",
    "name" : K. Mitnick,
    "numbers" : [
        "0982",
        "0981",
        "0958",
        "0982"
    ],
    "durations" : [
        98,
        110,
        66,
        92
    ],
    "goals" : [
        "2",
        "3",
        "0",
        "1"
    ]
}

查询效果很好,但是有重复的记录,所以我尝试使用$addToSet 运算符来避免重复:

db.getCollection('yourCollection').aggregate(
        {
            $match: {
                "number": number
            }
        },
        {
            $unwind: {
                path: "$dates",
                includeArrayIndex: "idx"
            }
        },
         $group: {
                    _id: '$_id',
                    dates: { $addToSet: '$dates' }
        },
        {
            $project: {
                _id: 0,
                dates: 1,
                numbers: { $arrayElemAt: ["$numbers", "$idx"] },
                goals: { $arrayElemAt: ["$goals", "$idx"] },
                durations: { $arrayElemAt: ["$durations", "$idx"] }
            }
        }
    )

但我只有日期(其他字段是null

{ dates: 
     [ '1399026850000',
       '1399101432000',
       '1399027545000',
       '1399904504000',
       '1399024474000',
       '1399126333000' ],
    numbers: null,
    goals: null,
    durations: null },
  { dates: 
     [ '1399027545000',
       '1399024474000',
       '1399518702000',
       '1399126333000',
       '1399209192000',
       '1399356651000' ],
    numbers: null,
    goals: null,
    conversation_durations: null },
  { dates: 
     [ '1399026850000',
       '1399101432000',
       '1399027545000',
       '1399904504000',
       '1399024474000' ],
    numbers: null,
    goals: null,
    durations: null } 

有人知道问题出在哪里吗?

【问题讨论】:

  • 当你做 $group 时,你基本上排除了所有其他变量。在那之后,您不能将它们重新投影回去。如果您要做的只是从数组中删除重复项,那么最好的选择是在您的 javascript / 客户端代码中执行此操作,或者使用 map-reduce。请参阅此处:stackoverflow.com/questions/9862255/… 您还可以修改 $group 管道阶段以在其中添加其他字段(请参阅 chridam 的答案)。

标签: mongodb mongodb-query aggregation-framework mongodb-aggregation


【解决方案1】:

您需要使用$first 运算符在$group 管道中包含字段,如下所示:

db.getCollection('yourCollection').aggregate([
    { "$unwind": "$dates" },
    {
        "$group": {
            "_id": "$_id",
            "dates": { "$addToSet": "$dates" },
            "numbers": { "$first": "$numbers" },
            "goals": { "$first": "$goals" },
            "durations": { "$first": "$durations" }
        }
    },
    { "$unwind": {
            "path": "$dates",
            "includeArrayIndex": "idx"
    } },
    {
        "$project": {
            "_id": 0,
            "dates": 1,
            "numbers": { "$arrayElemAt": ["$numbers", "$idx"] },
            "goals": { "$arrayElemAt": ["$goals", "$idx"] },
            "durations": { "$arrayElemAt": ["$durations", "$idx"] }
        }
    }
])

或使用 $setUnion 来消除重复项:

db.getCollection('yourCollection').aggregate([
    {
        "$project": {
            "_id": 0,
            "dates": { "$setUnion": ["$dates", "$dates"] },
            "numbers": 1,
            "goals": 1,
            "durations": 1
        }
    }
    { "$unwind": {
            "path": "$dates",
            "includeArrayIndex": "idx"
    } },
    {
        "$project": {
            "_id": 0,
            "dates": 1,
            "dateIndex": "$idx",
            "numbers": { "$arrayElemAt": ["$numbers", "$idx"] },
            "goals": { "$arrayElemAt": ["$goals", "$idx"] },
            "durations": { "$arrayElemAt": ["$durations", "$idx"] }
        }
    }
])

【讨论】:

  • 谢谢,两种解决方案我都试过了,但还是有重复的:/
  • 您能否使用产生重复的示例文档更新您的问题,并使用这些文档显示预期输出?
  • 这个问题不是已经解决了吗?您能否使用产生重复的示例文档和您的预期输出更新您的问题,以便我进行测试并确认?
  • 是的,但类似。另外,在 $unwind 运算符之前,我使用的是 $match。这会是个问题吗?查看第二个示例文档。我有{ dates: '1399027545000', numbers: '0982', goals: '2', durations: 92 }, { dates: '1399101432000', numbers: '0982', goals: '2', durations: 92 }, { dates: '1399026850000', numbers: '0982', goals: '2', durations: 92 }, { dates: '1399027545000', numbers: '0982', goals: '2', durations: 92 }。如您所见,最后一个文档是第一个文档的副本。
猜你喜欢
  • 1970-01-01
  • 2018-04-30
  • 1970-01-01
  • 2017-05-28
  • 1970-01-01
  • 2020-01-12
  • 1970-01-01
  • 2021-12-06
  • 1970-01-01
相关资源
最近更新 更多