【问题标题】:Mongodb group average arrayMongodb组平均数组
【发布时间】:2018-05-31 20:49:00
【问题描述】:

我正在尝试做 PyMongo 聚合 - $group 数组的平均值,但我找不到任何符合我的问题的示例。

数据示例

{
    Subject: "Dave",
    Strength: [1,2,3,4]
},
{
    Subject: "Dave",
    Strength: [1,2,3,5]
},
{
    Subject: "Dave",
    Strength: [1,2,3,6]
},
{
    Subject: "Stuart",
    Strength: [4,5,6,7]
},
{
    Subject: "Stuart",
    Strength: [6,5,6,7]
},
{
    Subject: "Kevin",
    Strength: [1,2,3,4]
},
{
    Subject: "Kevin",
    Strength: [9,4,3,4]
}

想要的结果

{
    Subject: "Dave",
    mean_strength = [1,2,3,5]
},
{
    Subject: "Stuart",
    mean_strength = [5,5,6,7]
},
{
    Subject: "Kevin",
    mean_strength = [5,3,3,4]
}

我已经尝试过这种方法,但 MongoDB 将数组解释为 Null?

pipe = [{'$group': {'_id': 'Subject', 'mean_strength': {'$avg': '$Strength'}}}]
results = db.Walk.aggregate(pipeline=pipe)

Out: [{'_id': 'SubjectID', 'total': None}]

我浏览了 MongoDB 文档,但我找不到或不明白是否有任何方法可以做到这一点?

【问题讨论】:

  • 分组时总是加$,这里"_id""$Subject"

标签: python arrays mongodb aggregation-framework pymongo


【解决方案1】:

您可以将$unwindincludeArrayIndex 一起使用。顾名思义,includeArrayIndex 将数组索引添加到输出。这允许按SubjectStrength 中的数组位置进行分组。计算平均值后,需要对结果进行排序,以确保第二个$group$push 将结果添加回正确的顺序。最后有一个$project 来包含和重命名相关的列。

db.test.aggregate([{
        "$unwind": {
            "path": "$Strength",
            "includeArrayIndex": "rownum"
        }
    },
    {
        "$group": {
            "_id": {
                "Subject": "$Subject",
                "rownum": "$rownum"
            },
            "mean_strength": {
                "$avg": "$Strength"
            }
        }
    },
    {
        "$sort": {
            "_id.Subject": 1,
            "_id.rownum": 1
        }
    },
    {
        "$group": {
            "_id": "$_id.Subject",
            "mean_strength": {
                "$push": "$mean_strength"
            }
        }
    },
    {
        "$project": {
            "_id": 0,
            "Subject": "$_id",
            "mean_strength": 1
        }
    }
])

对于您的测试输入,这将返回:

{ "mean_strength" : [ 5, 5, 6, 7 ], "Subject" : "Stuart" }
{ "mean_strength" : [ 5, 3, 3, 4 ], "Subject" : "Kevin" }
{ "mean_strength" : [ 1, 2, 3, 5 ], "Subject" : "Dave" }

【讨论】:

  • 在我最近的一次编辑中,我添加了几个引号 (") 以使这段代码在 PyMongo 中运行。Mongo shell 对此稍微宽容一些。
【解决方案2】:

您可以尝试以下聚合。

例如,戴夫在小组赛后有[[1,2,3,4], [1,2,3,5], [1,2,3,6]]

这是矩阵

归约函数

Pass   Current Value (c) Accumulated Value (b)       Next Value
First:   [1,2,3,5]        [[1],[2],[3],[4]]           [[1,1],[2,2],[3,3],[5, 4]]
Second:  [1,2,3,6]        [[1,1],[2,2],[3,3],[5, 4]]  [[1,1,1],[2,2,2],[3,3,3],[5, 4, 6]]

Map 函数 - 计算从 reduce 阶段到输出的每个数组值的平均值 [1,2,3,5]

[{"$group":{"_id":"$Subject","Strength":{"$push":"$Strength"}}}, //Push all arrays
 {"$project":{"mean_strength":{
   "$map":{//Calculate avg for each reduced indexed pairs.
     "input":{
       "$reduce":{
         "input":{"$slice":["$Strength",1,{"$subtract":[{"$size":"$Strength"},1]}]}, //Start from second array.
         "initialValue":{ //Initialize to the first array with all elements transformed to array of single values.
           "$map":{
             "input":{"$range":[0,{"$size":{"$arrayElemAt":["$Strength",0]}}]},
             "as":"a",
             "in":[{"$arrayElemAt":[{"$arrayElemAt":["$Strength",0]},"$$a"]}]
           }
         },
         "in":{
           "$let":{"vars":{"c":"$$this","b":"$$value"}, //Create variables for current and accumulated values
             "in":{"$map":{ //Creates map of same indexed values from each iteration 
                 "input":{"$range":[0,{"$size":"$$b"}]},
                 "as":"d",
                 "in":{
                   "$concatArrays":[ //Concat values at same index 
                     {"$arrayElemAt":["$$c","$$d"]}, //current
                     [{"$arrayElemAt":["$$b","$$d"]}] //accumulated
                  ]
                 }
               }
             }
           }
         }
       }
     },
    "as":"e",
    "in":{"$avg":"$$e"}
   }
 }}}
]

【讨论】:

    【解决方案3】:

    根据上述问题的描述,作为解决方案,请尝试执行以下聚合查询

    db.collection.aggregate(
    
      // Pipeline
      [
        // Stage 1
        {
          $unwind: { path: "$Strength", includeArrayIndex: "arrayIndex" }   
        },
    
        // Stage 2
        {
          $group: {
            _id:{Subject:'$Subject',arrayIndex:'$arrayIndex'},
            mean_strength:{$avg:'$Strength'}
          }
        },
    
        // Stage 3
        {
          $group: {
          _id:{'Subject':'$_id.Subject'},
          mean_strength:{$push:'$mean_strength'}
          }
        },
    
        // Stage 4
        {
          $project: {
          Subject:'$_id.Subject',
          mean_strength:'$mean_strength',
          _id:0
          }
        }
    
      ]
    
    
    );
    

    【讨论】:

      猜你喜欢
      • 2020-12-15
      • 1970-01-01
      • 1970-01-01
      • 2017-09-04
      • 2014-04-17
      • 2011-06-05
      • 2022-01-16
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多