【问题标题】:Rails Searchkick has_many indexing and searchingRails Searchkick has_many 索引和搜索
【发布时间】:2017-09-13 19:43:41
【问题描述】:

我可以按客户 ID、姓名、姓氏和孩子 ID、姓名、姓氏和“生日”进行搜索

id 搜索必须是准确的。按姓名或姓氏搜索有距离为 2 的拼写错误,并且有效。但我也想通过kid_birthdate 搜索精确匹配(拼写错误,距离0)

到目前为止,每当我按生日搜索时,返回的结果就像拼写错误的距离 2。我不知道如何搜索确切的日期。

Rails 5.1.0.rc1

elasticsearch-5.0.3

searchkick-2.2.0

class Customer < ActiveRecord::Base
  include Searchable

  def search_data
    attributes.merge avatar_url: avatar.url, kids: kids
  end

  has_many :kids
  ...
end

class Kid < ActiveRecord::Base
    belongs_to :customer

    def reindex_customer
        customer.reindex async: true
    end 
    ...
end      

module Searchable
  extend ActiveSupport::Concern

  included do
    SEARCH_RESULTS_PER_PAGE = 10

    def self.elastic_search(query, opts = { page: 1 })
      # This regex accept string that contains digits or dates
      regexp = /(\d+)|(^(0[1-9]|1\d|2\d|3[01])-(0[1-9]|1[0-2])-(19|20)\d{2}$)/
      distance = query.match?(regexp) ? 0 : 2 #This is for calculate the distance for misspelling 0 for digits and dates and 2 for strings
      options = { load: false,
                  match: :word_middle,
                  misspellings: { edit_distance: distance },
                  per_page: SEARCH_RESULTS_PER_PAGE,
                  page: opts[:page] }
      search query, options
    end
  end
end

我的索引包含客户数据和她/他的孩子数据。孩子们嵌套在她/他的父母客户之下。 如何强制搜索日期的精确匹配

对于这个查询:

curl http://localhost:9200/customers_development/_search?pretty -d '{"query":{"dis_max":{"queries":[{"match":{"_all":{"query":"28388","boost":10,"operator":"and","analyzer":"searchkick_search"}}},{"match":{"_all":{"query":"28388","boost":10,"operator":"and","analyzer":"searchkick_search2"}}},{"match":{"_all":{"query":"28388","boost":1,"operator":"and","analyzer":"searchkick_search","fuzziness":0,"prefix_length":0,"max_expansions":3,"fuzzy_transpositions":true}}},{"match":{"_all":{"query":"28388","boost":1,"operator":"and","analyzer":"searchkick_search2","fuzziness":0,"prefix_length":0,"max_expansions":3,"fuzzy_transpositions":true}}}]}},"size":10,"from":0,"timeout":"11s"}'

这是索引的外观:

{
  "took": 11,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 97.29381,
    "hits": [
      {
        "_index": "customers_development_20170913145033808",
        "_type": "customer",
        "_id": "28388",
        "_score": 97.29381,
        "_source": {
          "id": 28388,
          "created_at": "2017-07-10T19:49:43.856Z",
          "updated_at": "2017-09-13T03:01:51.727Z",
          "name": "Linda",
          "lastname": "Schott",
          "email": "linda.schott@web.de",
          "avatar": null,
          "phone": null,
          "mobile": null,
          "erster_kontakt": null,
          "memo": null,
          "brief_title": null,
          "newsletter": null,
          "avatar_url": "/no_customer.png",
          "kids": [
            {
              "id": 34229,
              "name": "Jakob",
              "lastname": "Schott",
              "birthdate": "2013-03-22",
              "age": "4,5",
              "avatar": {
                "url": "/avatars/kid/34229/Jellyfish.png",
                "thumb": {
                  "url": "/avatars/kid/34229/thumb_Jellyfish.png"
                }
              },
              "created_at": "2017-07-10T19:50:16.058Z",
              "updated_at": "2017-09-13T03:02:52.962Z",
              "customer_id": 28388,
              "member": null,
              "year_certified": null,
              "zahlart": null,
              "tn_merge_markiert": null,
              "family": null,
              "medal": "black",
              "score": 30,
              "current_level": "swimmys"
            },
            {
              "id": 34228,
              "name": "Lilith",
              "lastname": "Schott",
              "birthdate": "2013-03-22",
              "age": "4,5",
              "avatar": {
                "url": "/avatars/kid/34228/Penguins.png",
                "thumb": {
                  "url": "/avatars/kid/34228/thumb_Penguins.png"
                }
              },
              "created_at": "2017-07-10T19:50:16.058Z",
              "updated_at": "2017-09-13T03:02:52.962Z",
              "customer_id": 28388,
              "member": null,
              "year_certified": null,
              "zahlart": null,
              "tn_merge_markiert": null,
              "family": null,
              "medal": "green",
              "score": 17,
              "current_level": "beginner"
            },
            {
              "id": 27718,
              "name": "Johanna",
              "lastname": "Plischke",
              "birthdate": "2010-12-29",
              "age": "6,8",
              "avatar": {
                "url": "/avatars/kid/27718/Koala.png",
                "thumb": {
                  "url": "/avatars/kid/27718/thumb_Koala.png"
                }
              },
              "created_at": "2017-07-10T19:50:16.034Z",
              "updated_at": "2017-09-13T04:01:15.261Z",
              "customer_id": 28388,
              "member": null,
              "year_certified": null,
              "zahlart": null,
              "tn_merge_markiert": null,
              "family": null,
              "medal": "red",
              "score": 27,
              "current_level": ""
            }
          ]
        }
      }
    ]
  }
}

【问题讨论】:

  • 难道不是因为您的正则表达式还捕获日期,因此将编辑距离设置为 2?
  • 不,即使我将距离设置为 0。我有相同的结果。

标签: elasticsearch ruby-on-rails-5 searchkick


【解决方案1】:

我们来分析查询部分:

"match":{
    "_all":{
        "query":"28388",
        "boost":1,
        "operator":"and",
        "analyzer":"searchkick_search",
        "fuzziness":0,
        "prefix_length":0,
        "max_expansions":3,
        "fuzzy_transpositions":true
    }
}

_all

您说您的kids 是嵌套字段但您只是搜索_all,所以我们首先要明确的是kids 是否包含在_all 中。

正如document 所说:

为嵌套对象中的所有属性设置默认的include_in_all 值。嵌套文档没有自己的 _all 字段。而是将值添加到主“根”文档的 _all 字段中。

所以,第一个问题是索引嵌套类型是否将include_in_all设置为false,这使得嵌套字段无法被_all搜索。

嵌套查询

或者您可以选择嵌套查询来查询嵌套对象:

GET /_search
{
    "query": {
        "nested" : {
            "path" : "kids",
            "score_mode" : "avg",
            "query" : {
                "query_string": {
                  "fields": ["kids.birthdate"],
                  "query": "xxx"
                } 
            }
        }
    }
}

模糊

对于拼写错误,Elasticsearch 建议我们使用模糊查询:

GET /_search
{
    "query": {
        "fuzzy" : {
            "name" : {
                "value" :         "xxx",
                 "boost" :         1.0,
                 "fuzziness" :     2,
                 "prefix_length" : 0,
                 "max_expansions": 100
            }
        }
    }
}

组合查询

最后,我们可以使用bool查询将它们组合起来:

POST _search
{
  "query": {
    "bool" : {
      "must" : [{
            "nested" : {
                "path" : "kids",
                "query" : {
                    "query_string": {
                      "fields": ["kids.birthdate"],
                      "query": "xxx"
                    } 
                }
            }            
       },
        {  "fuzzy" : {
                "name" : {
                    "value" :         "xxx",
                     "boost" :         1.0,
                     "fuzziness" :     2,
                     "prefix_length" : 0,
                     "max_expansions": 100
                }
           }
       }]
    }
  }
}

我不熟悉 Ruby,所以我能提供帮助。希望对您有所帮助。

【讨论】:

    猜你喜欢
    • 2019-05-21
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2015-09-11
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-12-19
    相关资源
    最近更新 更多