使用 mongodb 聚合，如何将字段值转换为数组字面量答案

【问题标题】：Using mongodb aggregate, how to turn field values into array literal使用 mongodb 聚合，如何将字段值转换为数组字面量
【发布时间】：2015-03-12 01:27:33
【问题描述】：

我们正在查询返回的结果应该是建议的搜索词列表。

我们目前有一个查询可以检查多个字段的正则表达式匹配：

$or:[ 
{'description.position':/s/i}, 
{'employer.name':/s/i}, 
{'hiringManager.profile.name':/s/i}
]

我们希望返回的结果是唯一的匹配数组（不重复）。

返回的结果类似于：

I20150311-18:17:14.151(-7)?   "fields": {
I20150311-18:17:14.154(-7)?     "hiringManager": {
I20150311-18:17:14.157(-7)?       "profile": {
I20150311-18:17:14.160(-7)?         "name": "Seth Sandler"
I20150311-18:17:14.163(-7)?       }
I20150311-18:17:14.167(-7)?     },
I20150311-18:17:14.173(-7)?     "description": {
I20150311-18:17:14.177(-7)?       "position": "Cook"
I20150311-18:17:14.181(-7)?     },
I20150311-18:17:14.187(-7)?     "employer": {
I20150311-18:17:14.191(-7)?       "name": "Employer"
I20150311-18:17:14.195(-7)?     },
I20150311-18:17:14.206(-7)?   }
I20150311-18:17:14.209(-7)? }
I20150311-18:17:14.212(-7)? {
I20150311-18:17:14.223(-7)?   "fields": {
I20150311-18:17:14.226(-7)?     "hiringManager": {
I20150311-18:17:14.229(-7)?       "profile": {
I20150311-18:17:14.232(-7)?         "name": "Seth Sandler"
I20150311-18:17:14.234(-7)?       }
I20150311-18:17:14.237(-7)?     },
I20150311-18:17:14.240(-7)?     "description": {
I20150311-18:17:14.243(-7)?       "position": "Cook"
I20150311-18:17:14.246(-7)?     },
I20150311-18:17:14.249(-7)?     "employer": {
I20150311-18:17:14.252(-7)?       "name": "Employer 4"
I20150311-18:17:14.254(-7)?     },
I20150311-18:17:14.264(-7)?   }
I20150311-18:17:14.267(-7)? }
I20150311-18:17:14.269(-7)? {
I20150311-18:17:14.281(-7)?   "fields": {
I20150311-18:17:14.284(-7)?     "hiringManager": {
I20150311-18:17:14.287(-7)?       "profile": {
I20150311-18:17:14.290(-7)?         "name": "Seth Sandler"
I20150311-18:17:14.293(-7)?       }
I20150311-18:17:14.295(-7)?     },
I20150311-18:17:14.298(-7)?     "description": {
I20150311-18:17:14.301(-7)?       "position": "Chef"
I20150311-18:17:14.304(-7)?     },
I20150311-18:17:14.307(-7)?     "employer": {
I20150311-18:17:14.310(-7)?       "name": "Emplopyer 3"
I20150311-18:17:14.313(-7)?     },
I20150311-18:17:14.321(-7)?   }
I20150311-18:17:14.323(-7)? }
I20150311-18:17:14.325(-7)? {
I20150311-18:17:14.334(-7)?   "fields": {
I20150311-18:17:14.336(-7)?     "hiringManager": {
I20150311-18:17:14.338(-7)?       "profile": {
I20150311-18:17:14.340(-7)?         "name": "Seth Sandler"
I20150311-18:17:14.342(-7)?       }
I20150311-18:17:14.344(-7)?     },
I20150311-18:17:14.346(-7)?     "description": {
I20150311-18:17:14.348(-7)?       "position": "Chef"
I20150311-18:17:14.350(-7)?     },
I20150311-18:17:14.353(-7)?     "employer": {
I20150311-18:17:14.356(-7)?       "name": "Employer"
I20150311-18:17:14.359(-7)?     },
  I20150311-18:17:14.366(-7)?   }
I20150311-18:17:14.369(-7)? }

我们希望结果是一个唯一的数组，包含hiringManager.profile.name、employer.name 和description.position 的值。

我们当前的解决方案似乎并不理想（可能性能不佳），并且想知道是否可以使用 mongogodb 聚合函数将字段值放入数组中。

当前解决方案（不理想）：

aggregate([
{$match: {$or:[ {'description.position':/s/i}, {'employer.name':/s/i}, {'hiringManager.profile.name':/s/i}    ]}},
{$group: {_id: 1, positions: {$push: '$description.position'}, employerNames: {$push: '$employer.name'}, hiringManagerNames: {$push:'$hiringManager.profile.name'}}},
{$project: {_id:1, texts: {$setUnion: ['$positions', {$setUnion: ['$employerNames', '$hiringManagerNames']}]}}}
])
})

这个输出是正确的，但是我们想要一个更好的聚合函数来限制结果。

I20150311-18:25:26.461(-7)?   "result": [
I20150311-18:25:26.465(-7)?     {
I20150311-18:25:26.468(-7)?       "_id": 1,
I20150311-18:25:26.472(-7)?       "texts": [
I20150311-18:25:26.478(-7)?         "Employer 5",
I20150311-18:25:26.481(-7)?         "Employer 4",
I20150311-18:25:26.485(-7)?         "Employer 1",
I20150311-18:25:26.488(-7)?         "Manager",
I20150311-18:25:26.504(-7)?         "Cook",
I20150311-18:25:26.507(-7)?         "Chef",
I20150311-18:25:26.530(-7)?       ]
I20150311-18:25:26.534(-7)?     }
I20150311-18:25:26.538(-7)?   ]

【问题讨论】：

所以你的问题是结果只是一个大文档，你只需要响应中的“不同”“文本”值。对吗？
没错。问题是不同的值来自 3 个不同的字段（因为我们正在查询 3 个字段以进行正则表达式匹配）。

标签： regex mongodb mongodb-query aggregation-framework

【解决方案1】：

使用另一种技术可能会更好，以便通过使“文本”成为$group 管道的实际“分组键”来获得不同的结果。在 2.6 或更高版本的 odern MongoDB 版本中，有一个技巧可以合理有效地执行此操作：

db.collection.aggregate([
    { "$match": {
        "$or":[
            { "description.position":/s/i },
            { "employer.name":/s/i},
            { "hiringManager.profile.name":/s/i }
        ]
    }},
    { "$project": {
        "_id": { 
            "$setDifference": [
                { "$map": {
                    "input": { "$literal": ["A","B","C" ] },
                     "as": "type",
                    "in": { "$cond": [
                        { "$eq": [ "$$type", "A" ] },
                        "$description.position",
                        { "$cond": [
                            { "$eq": [ "$$type", "B" ] },
                            "$employer.name",
                            "$hiringManager.profile.name"
                        ]}
                    ]}
                }},
                [null] 
            ]
        }
    }},
    { "$unwind": "$_id" },
    { "$group": { "_id": "$_id" } }
])

所以$map 被用作通过处理发送给它的["A","B","C"] 的$literal 数组来触发“切换”的基础。因此，对于这些元素中的每一个，都会选择适当的字段作为输出值。

万一这些值中的任何一个是null 或什至可能是同一文档中的重复值，$setDifference 运算符将对其进行排序。

每个文档中的结果数组都使用$unwind 处理，因此它的元素可以作为分组键传递给$group，从而为每个“文本”词生成不同的文档。

当然，这里的权衡是管道中的文档将是集合中文档的倍数，每个字段最多三个可能的值，因此管道中的文档多于查询匹配，直到明显分组。所以使用$unwind时会产生成本。

好处是结果中的单独文档，通过使用光标输出可以超过 16MB 的单个“文本”。当然，开头有很多文字。

您现有聚合操作的另一个注意事项是考虑到您已经接受$setUnion 来组合字段并获得不同的值，您甚至可以通过使用$addToSet 来“减少”输入数组。这样可以避免使用最终会删除的重复数组来增加数组。

还应考虑相同的$setDifference 操作，因为您的$or 条件不能保证“所有”字段都包含有效字符串或什至存在。如果并非所有字段都有效，那么您还会收到 null 以及其他文本术语的不同结果。

所以这是关于权衡哪个对你来说更重要。目前的操作可能会更快且资源密集度更低（带有提到的修改），但替代方案迎合了更大且可能更可口的响应。它还允许您“限制”甚至可能执行诸如“计数”这些“文本”值的出现之类的事情。

【讨论】：

谢谢尼尔。我希望稍后对此进行测试。它是说在某个地方有一个额外的括号，我会研究一下，但这看起来像是我们正在寻找的解决方案。
@user1218464 可能。我只是在这里输入的。我也会检查语法。
@user1218464 啊。 $project 阶段后缺少逗号。
在对此进行了更多测试之后，我不确定结果是否真的是我们想要的（虽然它很接近）。如果搜索词与 3 个中的任何一个匹配，则此结果将为我们提供唯一的 $description.position。意思是，如果搜索查询与字段 $hiringManager.profile.name 匹配，则该值不是招聘经理（我们想要的），而是 @987654347 @ 在所有情况下。我们想要的是，如果搜索匹配 3 个字段之一，我们想要匹配字段的值，然后我们想要对这些字段进行分组或使其唯一，这样相同的结果不会超过 1 个。
重新表述：目标是建议搜索词。建议的术语应与 $description.position、$hiringManager.profile.name 或 $description.position 字段的正则表达式匹配。

【解决方案2】：

@Neil 的答案很接近，但似乎需要另一个匹配来确保结果与原始正则表达式匹配。我不确定这是否是一个好的解决方案，但这是一个新的工作聚合。没有setDifferennce 似乎也可以工作，所以我不确定是否需要这样做。

基本上，我在展开结果上运行另一个 match 以确保它们与原始正则表达式匹配。

aggregate([

  { '$match': {
        '$or':[
            { 'description.position':/s/i },
            { 'employer.name':/s/i},
            { 'hiringManager.profile.name':/s/i }
        ]
    }},
    { '$project': {
        '_id':  
                { '$map': {
                    'input': { '$literal': ['A','B','C' ] },
                     'as': 'type',
                     'in': { '$cond': [
                        { '$eq': [ '$$type', 'A' ] },
                        '$description.position',
                        { '$cond': [
                            { '$eq': [ '$$type', 'B' ] },
                            '$employer.name',
                            '$hiringManager.profile.name'
                        ]}
                    ]}
                },
        }
    }},
    { '$unwind': '$_id' },
    { '$match': { '_id':/s/i }},
{ '$group': { '_id': '$_id' } }
]);
});

【讨论】：