【发布时间】:2022-01-12 19:21:28
【问题描述】:
我得到了一个集合的以下文档(我们将其命名为myCollection):
{
"_id": {
"$oid": "601a75a0c9a338f09f238816"
},
"Sample": "lie50",
"Chromosome": "chr10",
"Position": {
"$numberLong": "47663"
},
"Reference": "C",
"Mutation": "T",
"Run": "Run_test",
"SYMBOL": "TUBB8"
},
{
"_id": {
"$oid": "601a75a0c9a338f09f238817"
},
"Sample": "lie50",
"Chromosome": "chr10",
"Position": {
"$numberLong": "47876"
},
"Reference": "T",
"Mutation": "C",
"Run": "Run_test",
"SYMBOL": "TUBB8"
},
{
"_id": {
"$oid": "601a75a0c9a338f09f238818"
},
"Sample": "lie50",
"Chromosome": "chr10",
"Position": {
"$numberLong": "48005"
},
"Reference": "G",
"Mutation": "A",
"Run": "Run_test",
"SYMBOL": "TUBB8"
},
{
"_id": {
"$oid": "601a75a0c9a338f09f238819"
},
"Sample": "lie12",
"Chromosome": "chr10",
"Position": {
"$numberLong": "48005"
},
"Reference": "G",
"Mutation": "A",
"Run": "Run_test",
"SYMBOL": "TUBB8"
}
我有兴趣打印 Chromosome、Position、Reference 和 Mutation 字段中值的不同计数。这意味着计算以下条目的唯一字段:
"Chromosome": "chr10", "Position": 47663, "Reference": "C", "Mutation": "T"
"Chromosome": "chr10", "Position": 47876, "Reference": "T", "Mutation": "C"
"Chromosome": "chr10", "Position": 48005, "Reference": "G", "Mutation": "A"
"Chromosome": "chr10", "Position": 48005, "Reference": "G", "Mutation": "A"
应该是3 不同的行。
我已经检查了多个类似one 的问题,关于如何打印一个字段的不同值或使用$unwind/$project。
对于后者,我想为什么不连接 4 个字段,然后使用 length 和 $unwind/$project 打印数字?
我设法做到了:
db.myCollection.aggregate(
[
{
$group:
{
_id: null,
newfield: {
$addToSet:
{
$concat:
[
"$Chromosome",
"_",
{"$toString":"$Position"},
"_",
"$Reference",
"_",
"$Mutation"
]
}
}
}
},
{
$unwind: "$newfield"
},
{
$project: { _id: 0 }
}
]).length
但是,将.length 添加到此查询不会返回任何内容,但不会返回:
{ "newfield" : "chr10_47663_C_T" }
{ "newfield" : "chr10_47876_T_C" }
{ "newfield" : "chr10_48005_G_A" }
作为参考,我的实际数据包含 20 亿个文档。
【问题讨论】: