【发布时间】:2014-12-23 03:14:50
【问题描述】:
我想看看一个集合中是否有一些重复的文档,以便我可以删除或合并相似的记录。
假设没有提供目标值,只提供目标字段,我所要做的就是根据目标字段找到所有相似的文档。
例如,我的集合persons 包含以下文档:
{
_id: 1,
email: "foo@bar.com",
name: "tom",
phone: 320513218,
company: {
name: "Bar"
department: "Marketing"
}
},{
_id: 2,
email: "foo@bar.com",
name: "alex c",
phone: 7320320813,
company: {
name: "Bar"
department: "Development"
}
},{
_id: 3,
email: "not_foo@not_bar.com",
name: "alex w",
phone: 895120981,
company: {
name: "Not Bar"
department: "Development"
}
},{
_id: 4,
email: "not_foo@not_bar.com",
name: "emily",
phone: 895120981,
company: {
name: "Another Company"
department: "Marketing"
}
},{
_id: 5,
email: "foo@bar.com",
name: "emily",
phone: 7320320813,
company: {
name: "Another Company"
department: "Marketing"
}
},...
我想先找到基于
email的重复文档,我应该得到[{_id: 1, count: 3}, {_id: 2, count: 3}, {_id: 5, count: 3}, {_id: 3, count: 2}, {_id: 4, count: 2}]作为结果。 (不用担心数组的顺序)然后,我想根据
phone查找重复文档,结果应该是[{_id: 2, count: 2}, {_id: 5, count: 2}, {_id: 3, count: 2}, {_id: 4, count: 2}]。 (不用担心数组的顺序)那么,我想根据
name查找重复文档,结果应该是[{_id: 2, count: 2}, {_id: 3, count: 2}, {_id: 4, count: 2}, {_id: 5, count: 2}]。最后,我想根据
email和phone找到重复的文档,结果应该是[{_id: 2, count: 2}, {_id: 5, count: 2}]。
(count应该是重复记录的数量(自包含))
我已经尝试了 mongo/mongoose 提供的 mapReduce 和 aggregate 方法,但它们无法满足我的期望。
我想要“按多个(相似)字段分组和计数”之类的东西
如果您需要更多信息,请告诉我,例如我当前的示例代码。
【问题讨论】: