【问题标题】:Multiple MapReduce Functions or Aggregate Frameworks for unique value and count in Mongodb?多个 MapReduce 函数或聚合框架用于 Mongodb 中的唯一值和计数?
【发布时间】:2013-08-20 02:28:14
【问题描述】:

我对 MongoDB 中的 mapReduce 和聚合有点陌生。

这是数据集的一个示例:

{ "_id" : ObjectId("521002161e0787522098d110"), "userId" : 4545454, "pickId" : 1, "answerArray" : [  "yes" ], "city" : "New York", "state" : "New York" }
{ "_id" : ObjectId("521002481e0787522098d111"), "userId" : 64545454, "pickId" : 1, "answerArray" : [  "no" ], "city" : "New York", "state" : "New York" }
{ "_id" : ObjectId("521002871e0787522098d112"), "userId" : 78263636, "pickId" : 1, "answerArray" : [  "yes" ], "city" : "Albany", "state" : "New York" }
{ "_id" : ObjectId("5211507c1e0787522098d113"), "userId" : 78263636, "pickId" : 2, "answerArray" : [  "yes" ], "city" : "New York", "state" : "New York" }
{ "_id" : ObjectId("5211507c1e0787522098d113"), "userId" : 78263636, "pickId" : 1, "answerArray" : [  "yes" ], "city" : "Wichita", "state" : "Kansas" }

我希望获取州、城市、pickId、answerArray 的唯一值列表,然后计算这些唯一组合。结果需要如下所示:

{"pickId": 1, "city": "New York", "state": "New York", "answerArray": ["yes"], "count":2}
{"pickId": 1, "city": "Albany", "state": "New York", "answerArray": ["no"], "count":1}
{"pickId": 1, "city": "New York", "state": "New York", "answerArray": ["no"], "count":1}
{"pickId": 1, "city": "Wichita", "state": "Kansas", "answerArray": ["yes"], "count":1}

我遇到的问题是 mapReduce 只接受两个参数:

Error: fast_emit takes 2 args near...

但我希望将多个唯一值映射到一个pickId。

这是我正在查看的 mapReduce 中的代码:

var mapFunct = function() {
if(this.answerArray == "yes"){
emit(this.pickId,1);}
else{
emit(this.pickId,0);};}

var mapReduce2 = function(keyPickId,answerVals){ 
return Array.sum(answerVals);};

db.answers.mapReduce( mapFunct, mapReduce2, { out: "mapReduceAnswers"})

任何帮助或进一步的建议将不胜感激。我也研究过聚合框架,但似乎我不会得到我需要的那种输出。

【问题讨论】:

    标签: mongodb mapreduce mongodb-query aggregation-framework


    【解决方案1】:

    我认为您可以使用聚合获得所需的格式,特别是 $group$project 运算符。看看这个聚合调用:

    var agg_output = db.answers.aggregate([
      { $group: { _id: {
                    city: "$city",
                    state: "$state",
                    answerArray: "$answerArray",
                    pickId: "$pickId"
                }, count: { $sum: 1 }}
      },
      { $project: { city: "$_id.city", 
                    state: "$_id.state", 
                    answerArray: "$_id.answerArray", 
                    pickId: "$_id.pickId", 
                    count: "$count", 
                    _id: 0}
      }
    ]);
    
    db.answer_counts.insert(agg_output.result);
    

    $group 阶段负责汇总 city/state/answerArray/pickId 的每个唯一组合的出现次数,而 $project 阶段将数据放入您想要的形式。

    insert 调用将结果输出保存到新集合中。这有意义吗?

    【讨论】:

    • 太棒了——让我知道进展如何。 :)
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2016-12-21
    • 1970-01-01
    • 2018-01-16
    • 2013-12-25
    • 2018-07-30
    • 2012-12-12
    • 2013-07-20
    相关资源
    最近更新 更多