【问题标题】:Elasticsearch: full-text search and filtering by nested array of objectsElasticsearch:通过嵌套对象数组进行全文搜索和过滤
【发布时间】:2020-04-12 13:23:46
【问题描述】:

有一个任务是创建一个基于 PostgreSQL 中 N-join 表的数据构建的 GUI 表。 此 GUI 表暗示了具有全文搜索功能的排序和过滤。

我想为此目的使用弹性。为弹性搜索准备了这个数据结构:

{
  did_user_read: true,
  view_info: {
      total: 1,
      users: [
          { name: 'John Smith', read_at: '2020-02-04 11:00:01', is_current_user: false },
          { name: 'Samuel Jackson', read_at: '2020-02-04 11:00:01', is_current_user: true },
      ],
  },
  is_favorite: true,
  has_attachments: true,
  from: { 
      short_name: 'You',  
      full_name: 'Chuck Norris',
      email: 'ch.norris@example.com', 
      is_current_user: true 
  },
  subject: 'The secret of the appearance of navel lints',
  received_at: '2020-02-04 11:00:01'
}

请告知如何正确索引此结构,以便您可以按嵌套对象和嵌套对象数组进行过滤和搜索?

例如,我想获取所有符合这些条件的记录:

is_favorite IS false

AND

FULL_TEXT_SEARCH("sam jackson") 
   BY FIELDS 
    users.name,        -- inside of array(!) 
    from.full_name,
    from.short_name

AND

users.is_current_user IS NOT false

AND

ORDER BY received_at DESC

【问题讨论】:

    标签: elasticsearch elasticsearch-dsl elasticsearch-query


    【解决方案1】:

    您对上述数据结构的弹性搜索索引映射应该是:

    映射

    {
        "mappings": {
            "properties": {
                "did_user_read": {
                    "type": "boolean"
                },
                "view_info": {
                    "properties": {
                        "total": {
                            "type": "integer"
                        },
                        "users": {
                            "properties": {
                                "name": {
                                    "type": "text"
                                },
                                "read_at": {
                                    "type": "date",
                                    "format": "date_hour_minute_second"
                                },
                                "is_current_user": {
                                    "type": "boolean"
                                }
                            }
                        }
                    }
                },
                "is_favorite": {
                    "type": "boolean"
                },
                "has_attachments": {
                    "type": "boolean"
                },
                "from": {
                    "properties": {
                        "short_name": {
                            "type": "text"
                        },
                        "full_name": {
                            "type": "text"
                        },
                        "email": {
                            "type": "keyword"
                        },
                        "is_current_user": {
                            "type": "boolean"
                        }
                    }
                },
                "subject": {
                    "type": "text"
                },
                "received_at": {
                    "type": "date",
                    "format": "date_hour_minute_second"
                }
            }
        }
    }
    

    现在我已经按照您在示例中给出的相同格式索引了一些文档。

    基于询问条件的搜索查询应该是:

    搜索查询:

    {
        "query": {
            "bool": {
                "filter": [
                    {
                        "term": {
                            "is_favorite": false
                        }
                    },
                    {
                        "term": {
                            "view_info.users.is_current_user": true  
                        }
                    }
                ],
                "must": {
                    "multi_match": {
                        "query": "sam jackson",
                        "fields": [
                            "view_info.users.name",
                            "from.full_name",
                            "from.short_name"
                        ]
                    }
                }
    
    
            }
    
        },
        "sort": [
        {
          "received_at": {
            "order": "desc"
          }
        }
      ]
    }
    

    输出

    "hits": [
          {
            "_index": "topics",
            "_type": "_doc",
            "_id": "3",
            "_score": null,
            "_source": {
              "did_user_read": true,
              "view_info": {
                "total": 1,
                "users": [
                  {
                    "name": "John Smith",
                    "read_at": "2020-02-04T11:00:01",
                    "is_current_user": false
                  },
                  {
                    "name": "Samuel Jackson",
                    "read_at": "2020-02-04T11:00:01",
                    "is_current_user": true
                  }
                ]
              },
              "is_favorite": false,
              "has_attachments": true,
              "from": {
                "short_name": "You",
                "full_name": "Chuck Norris",
                "email": "ch.norris@example.com",
                "is_current_user": true
              },
              "subject": "The secret of the appearance of navel lints",
              "received_at": "2020-02-04T11:00:03"
            },
            "sort": [
              1580814003000
            ]
          },
          {
            "_index": "topics",
            "_type": "_doc",
            "_id": "2",
            "_score": null,
            "_source": {
              "did_user_read": true,
              "view_info": {
                "total": 1,
                "users": [
                  {
                    "name": "John Smith",
                    "read_at": "2020-02-04T11:00:01",
                    "is_current_user": false
                  },
                  {
                    "name": "Samuel Jackson",
                    "read_at": "2020-02-04T11:00:01",
                    "is_current_user": true
                  }
                ]
              },
              "is_favorite": false,
              "has_attachments": true,
              "from": {
                "short_name": "You",
                "full_name": "Chuck Norris",
                "email": "ch.norris@example.com",
                "is_current_user": true
              },
              "subject": "The secret of the appearance of navel lints",
              "received_at": "2020-02-04T11:00:01"
            },
            "sort": [
              1580814001000
            ]
          }
        ]
    

    解释:

    根据您的查询,这是构建搜索查询的方式:

    • is_favorite IS false and users.is_current_user IS NOT false

      这是在 filter 查询的帮助下完成的。当我们希望我们的文档满足某些条件但它们对计算搜索文档的分数没有贡献时,使用过滤器。现在,由于两个查询字段都是布尔值,它们不会参与计算得分,因为答案是是或否。

    • FULL_TEXT_SEARCH("sam jackson") BY FIELDS users.name, -- inside of array(!) from.full_name, from.short_name

      在这里我们要搜索sam jackson,它们应该在所有 3 个字段中,所以 使用match_phrase

    这三个条件保留在bool 过滤器中,因为有AND 条件加入它们

    • ORDER BY received_at DESC

      为此使用sort 查询

    注意:您必须更改存在日期时间的数据,例如 read_at, received_at 。目前,您采用的格式为 2020-02-04 11:00:01 。您只需稍作更改,以便在 elasticsearch 中索引文档时采用格式 2020-02-04T11:00:01(而不是空间使用 T),因为 elasticsearch 仅接受一组日期时间格式。您可以在此处参考日期时间接受的格式https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-date-format.html

    【讨论】:

    • Gald 知道它对您有帮助。能否请您投同样的票。
    猜你喜欢
    • 2018-01-16
    • 1970-01-01
    • 2022-12-22
    • 2021-09-13
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-11-19
    • 1970-01-01
    相关资源
    最近更新 更多