MongoDB 文本搜索 - 匹配字符串中的确切标记答案

【问题标题】：MongoDB text search - Match exact tokens in a stringMongoDB 文本搜索 - 匹配字符串中的确切标记
【发布时间】：2017-10-15 13:46:19
【问题描述】：

我遇到了需要在 MongoDB 中通过匹配字符串中的 exact 标记来执行 $text $search 的情况。我想我可以通过创建一个没有默认语言的文本索引来解决这个问题，并通过用\"token\" 包装每个标记来执行查询，如documentation 中所写。所以我以这种方式创建了我的索引：

db.collection.createIndex({"denom": "text"}, {"default_language": "none"})

我必须执行的查询是

db.collection.find( {"$text": {"$search": "\"consorzio\" \"la\""}}, {"denom": 1} )

我期望的结果是所有文档都包含完全正确的标记 "consorzio" 和 "la"，但是这个查询匹配的文档的标记包含字符串 "la" 和 " consorzio" 在每个令牌中

例如，上面的查询返回以下我期望的 denom 值：

CONSORZIO LA* CASCINA 好的
LA RADA CONSORZIO 好的
GESCO CONSORZIO AGRICOLA 错误

有人可以解决这个问题吗？我希望问题很清楚。

非常感谢您。

【问题讨论】：

标签： mongodb full-text-search text-search

【解决方案1】：

Mongodb 已报告此 issue 的错误。精确加工不起作用。

你可以看看maching score：

db.docs.find({$text: {$search: "\"consorzio\" \"la\""}}, 
             {score: { $meta: "textScore" }, "_id": 0})

{ "t" : "CONSORZIO LA* CASCINA OK", "score" : 1.25 } 
{ "t" : "LA RADA CONSORZIO OK", "score" : 1.25 }
{ "t" : "GESCO CONSORZIO AGRICOLA WRONG", "score" : 0.625 }

解决方案应该是考虑到最高分...

【讨论】：

【解决方案2】：

Fernando 你错了，它匹配GESCO CONSORZIO AGRICOLA WRONG，但它只匹配你搜索的一个词（标记）consorzio 而不是la。

在文本搜索中textScore 将大于 1 匹配查询的所有标记。

例如这里是一个商店集合

db.stores.insert(
   [
     { _id: 1, name: "Java Hut", description: "Coffee and cakes" },
     { _id: 2, name: "Burger Buns", description: "Gourmet hamburgers" },
     { _id: 3, name: "Coffee Java Shop", description: "Just coffee" },
     { _id: 4, name: "Clothes Clothes Clothes", description: "Discount clothing" },
     { _id: 5, name: "Java Shopping", description: "Indonesian goods" },
     { _id: 6, name: "Java Hut", description: "Coffee and cakes" }
   ]
)

索引

db.stores.createIndex( { name: "text" } )

现在如果我查询

db.stores.find({
    $text: {
        $search: "Java Shop"
    }
}, {
    score: {
        $meta: "textScore"
    }
}).sort({
    score: {
        $meta: "textScore"
    },
    _id: -1
})

它将匹配令牌，结果是

/* 1 */
{
    "_id" : 6.0,
    "name" : "Java Shopping",
    "description" : "Indonesian goods",
    "score" : 1.5
}

/* 2 */
{
    "_id" : 5.0,
    "name" : "Java Shopping",
    "description" : "Indonesian goods",
    "score" : 1.5
}

/* 3 */
{
    "_id" : 3.0,
    "name" : "Java Coffee Shop",
    "description" : "Just coffee",
    "score" : 1.33333333333333
}

/* 4 */
{
    "_id" : 1.0,
    "name" : "Java Hut",
    "description" : "Coffee and cakes",
    "score" : 0.75
}

在这里您可以看到前三个文档匹配所有标记，这就是为什么 score 大于 1 而最后一个文档 score 小于 1 因为它只匹配一个标记。

现在，在得分大于 1 的情况下，您还可以获得匹配所有标记的最佳文档。为此，我们需要使用 MongoDB Aggregation。

db.stores.aggregate([
  { 
      "$match": { 
             "$text": { 
                   "$search": "Java Shop" 
              } 
       } 
  },
  { 
       "$addFields": { 
             "score": { 
                   "$meta": "textScore" 
              } 
        } 
   },
   { 
        "$match": { 
              "score": { "$gt": 1.0 } 
         } 
   },
   { 
        "$sort": { 
              "score": -1, _id: -1 
         }
   },
   { 
        "$limit": 1
   }
])

&这里是结果

/* 1 */
{
    "_id" : 6.0,
    "name" : "Java Shopping",
    "description" : "Indonesian goods",
    "score" : 1.5
}

【讨论】：