【问题标题】:Mongodb print count of unique values from multiple fieldsMongodb 打印来自多个字段的唯一值的计数
【发布时间】:2022-01-12 19:21:28
【问题描述】:

我得到了一个集合的以下文档(我们将其命名为myCollection):

{
    "_id": {
        "$oid": "601a75a0c9a338f09f238816"
    },
    "Sample": "lie50",
    "Chromosome": "chr10",
    "Position": {
        "$numberLong": "47663"
    },
    "Reference": "C",
    "Mutation": "T",
    "Run": "Run_test",
    "SYMBOL": "TUBB8"
},
{
    "_id": {
        "$oid": "601a75a0c9a338f09f238817"
    },
    "Sample": "lie50",
    "Chromosome": "chr10",
    "Position": {
        "$numberLong": "47876"
    },
    "Reference": "T",
    "Mutation": "C",
    "Run": "Run_test",
    "SYMBOL": "TUBB8"
},
{
    "_id": {
        "$oid": "601a75a0c9a338f09f238818"
    },
    "Sample": "lie50",
    "Chromosome": "chr10",
    "Position": {
        "$numberLong": "48005"
    },
    "Reference": "G",
    "Mutation": "A",
    "Run": "Run_test",
    "SYMBOL": "TUBB8"
},
{
    "_id": {
        "$oid": "601a75a0c9a338f09f238819"
    },
    "Sample": "lie12",
    "Chromosome": "chr10",
    "Position": {
        "$numberLong": "48005"
    },
    "Reference": "G",
    "Mutation": "A",
    "Run": "Run_test",
    "SYMBOL": "TUBB8"
}

我有兴趣打印 ChromosomePositionReferenceMutation 字段中值的不同计数。这意味着计算以下条目的唯一字段:

"Chromosome": "chr10", "Position": 47663, "Reference": "C", "Mutation": "T"
"Chromosome": "chr10", "Position": 47876, "Reference": "T", "Mutation": "C"
"Chromosome": "chr10", "Position": 48005, "Reference": "G", "Mutation": "A"
"Chromosome": "chr10", "Position": 48005, "Reference": "G", "Mutation": "A"

应该是3 不同的行。

我已经检查了多个类似one 的问题,关于如何打印一个字段的不同值或使用$unwind/$project

对于后者,我想为什么不连接 4 个字段,然后使用 length$unwind/$project 打印数字?

我设法做到了:

db.myCollection.aggregate(
[
  {
    $group:
    {
      _id: null,
      newfield: {
        $addToSet:
        {
          $concat:
          [
            "$Chromosome",
            "_",
            {"$toString":"$Position"},
            "_",
            "$Reference",
            "_",
            "$Mutation"
          ]
        }
      }
    }
  },
  {
    $unwind: "$newfield"
  },
  { 
    $project: { _id: 0 }
  }
]).length

但是,将.length 添加到此查询不会返回任何内容,但不会返回:

{ "newfield" : "chr10_47663_C_T" }
{ "newfield" : "chr10_47876_T_C" }
{ "newfield" : "chr10_48005_G_A" }

作为参考,我的实际数据包含 20 亿个文档。

【问题讨论】:

    标签: mongodb count distinct


    【解决方案1】:

    字段应该在$group阶段传入_id,并且还使用$count阶段获取总元素而不是返回所有文档,

    db.myCollection.aggregate([
      {
        $group: {
          _id: {
            Chromosome: "$Chromosome",
            Position: "$Position",
            Reference: "$Reference",
            Mutation: "$Mutation"
          }
        }
      },
      { $count: "count" }
    ])
    

    Playground

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2016-01-18
      • 1970-01-01
      • 2020-06-12
      • 2014-02-24
      • 1970-01-01
      • 2018-02-07
      相关资源
      最近更新 更多