MongoDB找到最接近的匹配答案

【问题标题】：MongoDB find closest matchMongoDB找到最接近的匹配
【发布时间】：2017-03-11 01:21:21
【问题描述】：

我想知道是否可以通过最接近的匹配访问 MongoDB 中的文档。例如我的搜索查询始终包含：
name
country
city

以下规则已经到位：
1.name总是要匹配
2. 如果存在country 或city，则国家优先级更高
3. 如果country 或city 不匹配，则仅考虑此文档，如果它们具有默认值（例如对于字符串：“”）

示例查询：
name = "测试"
country = "美国"
city = "西雅图"

文件：

db.stuff.insert([
{
    name:"Test",
    country:"",
    city:"Seattle"
},{
    name:"Test3",
    country:"USA",
    city:"Seattle"
},{
    name:"Test",
    country:"USA",
    city:""
},{
    name:"Test",
    country:"Germany",
    city:"Seattle"
},{
    name:"Test",
    country:"USA",
    city:"Washington"
}
])

它应该返回第三个文档

谢谢！

【问题讨论】：

它应该返回 2 个文件中的哪一个：名称匹配的文件，还是城市匹配的文件？
最近的一个。如果它发现匹配的 2 个属性是正确的，则返回两者都是一个选项。如果没有完全匹配，可能会首先返回最接近匹配的文档列表
如何根据这些字段的存在或不存在为您的文档添加权重并使用它来查询您的文档？
“可能”？如果您不知道结果中需要什么，那就很难回答了。
是的，@Styvane。这正是我的想法。只需要 OP 提供一些关于如何构建权重的指南，但显然范围已更改为返回“几乎匹配”文档的有序列表。

标签： mongodb

【解决方案1】：

考虑到不确定的要求和相互矛盾的更新，答案是解决“是否有可能”部分的指南。

应该调整示例以满足期望。

db.stuff.aggregate([
    {$match: {name: "Test"}}, // <== the fields that should always match
    {$facet: {
        matchedBoth: [
            {$match: {country: "USA", city: "Seattle"}},  // <== bull's-eye
            {$addFields: {weight: 10}}                    // <== 10 stones
        ],
        matchedCity: [
            {$match: {country: "", city: "Seattle"}},   // <== the $match may need to be improved, see below 
            {$addFields: {weight: 5}}            
        ],
        matchedCountry: [
            {$match: {country: "USA", city: ""}},
            {$addFields: {weight: 0}}                  // <== weightless, yet still a match
        ]
        // add more rules here, if needed
    }},
    // get them together. Should list all rules from above  
    {$project: {doc: {$concatArrays: ["$matchedBoth", "$matchedCity", "$matchedCountry"]}}},
    {$unwind: "$doc"},              // <== split them apart
    {$sort: {"doc.weight": -1}},    // <== and order by weight, desc
    // reshape to retrieve documents in its original format 
    {$project: {_id: "$doc._id", name: "$doc.name", country: "$doc.country", city: "$doc.city"}}
]);

问题中解释最少的部分会影响我们如何构建方面。例如

{$match: {country: "", city: "Seattle"}}

匹配所有明确显示国家的文档并且是一个空字符串。

很有可能

{$match: {country: {$ne: "USA"}, city: "Seattle"}}

获取具有匹配名称和城市以及任何国家/地区的所有文件，甚至

{$match: {$and: [{$or: [{country: null}, {country: ""}]}, {city: "Seattle"}]}}

等等

【讨论】：

我认为有必要添加 $group 作为最后阶段以获得不同的文档。 @亚历克斯
@wael32gh，是的，好点。如果文档在多个方面匹配条件，您将需要选择我猜的权重最高的一个。如果它们不是相互排斥的，则实际上取决于条件。如果 OP 在“最接近的匹配”之后，$limit: 1 就足够了。

【解决方案2】：

这是一个查询

db.collection.aggregate([
  {$match: {name:"Test"}},
  {$project: {
      name:"$name",
      country: "$country",
      city:"$city",
      countryMatch: {$cond: [{$eq:["$country", "USA"]}, true, false]},
      cityMatch: {$cond:[{$eq:["$city", "Seattle"]}, true, false]}
  }},
  {$match: {$and: [
      {$or:[{countryMatch:true},{country:""}]},
      {$or:[{cityMatch:true},{city:""}]}
      ]}},
  {$sort: {countryMatch:-1, cityMatch:-1}},
  {$project: {name:"$name", country:"$country", city:"$city"}}
])

解释：

第一次匹配过滤掉与名称不匹配的文档（因为规则 #1 - 名称应该匹配）。

下一个投影选择文档字段以及有关国家和城市匹配的一些信息。我们将需要它来进一步过滤和排序文档。

第二次匹配过滤掉那些既不匹配国家又不匹配城市并且这些字段没有默认值的文档（规则#3）。

按照规则 #2 所述，排序文档会将国家/地区匹配移动到城市匹配之前。最后 - 投影选择必填字段。

输出：

{
    _id: 3,
    name : "Test",
    country : "USA",
    city : ""
},
{
    _id: 1,
    name : "Test",
    country : "",
    city : "Seattle"
}

您可以将查询结果限制为仅获得最接近的匹配项。

【讨论】：

我看不到这如何回答了 OP 的问题