【问题标题】:Solr suggester: distributed search (solrcloud) duplicate resultsSolrSuggester:分布式搜索(solrcloud)重复结果
【发布时间】:2015-01-05 19:15:29
【问题描述】:

我有两个分片,并且正在尝试使用对分片的分布式搜索来实现建议器(使用 solr 4.10.1)。似乎建议者遍历每个分片并加入结果集,留下重复项。在我的 solrconfig.xml 文件中,我有以下内容:

<searchComponent name="suggest" class="solr.SuggestComponent">
    <lst name="suggester">
      <str name="name">titleSuggester</str>
      <str name="lookupimpl">AnalyzingLookupFactory</str>
      <str name="lookupimpl">FreeTextSuggesterFactory</str>
      <str name="dictionaryimpl">DocumentDictionaryFactory</str>
      <str name="field">title_sug</str>
      <str name="weightField">rank</str>
      <str name="suggestAnalyzerFieldType">shingleSuggest</str>
      <str name="buildOnCommit">true</str>
    </lst>
</searchComponent>`


<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
  <lst name="defaults">
    <str name="suggest">true</str>
    <str name="suggest.count">10</str>
  </lst>
  <arr name="components">
    <str>suggest</str>
  </arr>
</requestHandler>

http://localhost:8983/solr/collection1/suggest?suggest.dictionary=titleSuggester&amp;shards.qt=/suggest&amp;shards=shard1,shard2&amp;suggest.q=an&amp;wt=json&amp;indent=true 结果:

{   "responseHeader":{
    "status":0,
    "QTime":12},   "suggest":{"titleSuggester":{
      "an":{
        "numFound":10,
        "suggestions":[{
            "term":"an",
            "weight":149,
            "payload":""},
          {
            "term":"an",
            "weight":142,
            "payload":""},
          {
            "term":"an american",
            "weight":6,
            "payload":""},
          {
            "term":"an affair",
            "weight":4,
            "payload":""},
          {
            "term":"an 18th century",
            "weight":2,
            "payload":""},
          {
            "term":"an 18th",
            "weight":2,
            "payload":""},
          {
            "term":"an american hymn",
            "weight":2,
            "payload":""},
          {
            "term":"an 18th century drawing room",
            "weight":2,
            "payload":""},
          {
            "term":"an 18th century drawing",
            "weight":2,
            "payload":""},
          {
            "term":"an american hymn (main",
            "weight":2,
            "payload":""}]}}}}

从上面可以看出,结果项“an”被返回了两次,每个分片一个。如果我用 distrib=false ( http://localhost:8983/solr/collection1/suggest?suggest.dictionary=titleSuggester&amp;distrib=false&amp;suggest.q=an&amp;wt=json&amp;indent=true),正如预期的那样,我只得到没有重复:

{ "responseHeader":{
    "status":0,
    "QTime":1},
  "suggest":{"titleSuggester":{
      "an":{
        "numFound":10,
        "suggestions":[{
            "term":"an",
            "weight":149,
            "payload":""},
          {
            "term":"an 18th",
            "weight":2,
            "payload":""},
          {
            "term":"an 18th century",
            "weight":2,
            "payload":""},
          {
            "term":"an 18th century drawing",
            "weight":2,
            "payload":""},
          {
            "term":"an 18th century drawing room",
            "weight":2,
            "payload":""},
          {
            "term":"an absolution take",
            "weight":1,
            "payload":""},
          {
            "term":"an absolution take her",
            "weight":1,
            "payload":""},
          {
            "term":"an absolution take her to",
            "weight":1,
            "payload":""},
          {
            "term":"an absolution take her to sea,",
            "weight":1,
            "payload":""},
          {
            "term":"an affair",
            "weight":4,
            "payload":""}]}}}}

有没有办法去除重复的结果?

【问题讨论】:

  • 我更喜欢在 Solr 中执行此操作,但如果我没有得到解决方案,我们将在客户端执行此操作。
  • @你能弄明白吗?就我而言,没有distrib=false 我得到了一个非常高的数字,但有了distrib=false 我得到了正确的计数。
  • 没有。起初我们只是过滤掉客户端上的重复项,然后它变得无关紧要,因为(因为找到here 的原因)我为建议者创建了一个新核心,并且我使用一个分片制作。 distrib=false 不会重复结果,因为它仅从其中一个核心获取结果。
  • 似乎是因为 Solr 路由器策略在我们的例子中从 composite 更改为 implicit

标签: solr


【解决方案1】:

您可以使用 Solr 的群组功能;添加到您的查询中:

&group=true&group.field=term&group.main=true

这将只返回每个相同术语的一个文档,并将以与常规查询相同的格式返回它们 (group.main=true)。

请参阅http://wiki.apache.org/solr/FieldCollapsing 了解更多信息。

【讨论】:

  • 不幸的是,添加组参数不会改变我的结果。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2019-11-12
  • 2017-08-12
  • 2012-07-31
相关资源
最近更新 更多