【问题标题】:Elastic(search): Get docs with max and min timestamp values弹性(搜索):获取具有最大和最小时间戳值的文档
【发布时间】:2015-05-24 01:03:49
【问题描述】:

我在搜索时遇到问题,我不知道该怎么做。我的文档格式如下:

{
"timestamp":"2015-03-17T15:05:04.563Z",
"session_id":"1",
"user_id":"jan"
}

假设会话 id 的第一个时间戳是“登录”,最后一个时间戳是“注销”。我想要所有会话的所有“登录”和“注销”文档(如果可能,按user_id 排序)。我设法通过聚合获得了正确的时间戳:

{
"aggs" : {
    "group_by_uid" : {
        "terms" : { 
            "field" : "user_id"
        },
        "aggs" : {
            "group_by_sid" : {
                "terms" : {
                    "field" : "session_id"
                },
                "aggs" : {
                    "max_date" : {
                        "max": { "field" : "timestamp" }
                    },
                    "min_date" : {
                        "min": { "field" : "timestamp" }
                    }
                }
            }
        }
    }
}
}

但是我如何获得相应的文档呢?我也不介意我是否必须进行 2 次搜索(一次用于登录,一次用于注销)。我尝试了热门聚合和排序内容,但我总是遇到解析错误:/

希望有人能给我一个提示:)

最好的问候, 一月

【问题讨论】:

    标签: search elasticsearch


    【解决方案1】:

    你已经很接近了。这个怎么样。使用两次搜索,每次都按照您的方式进行聚合,但随后还会在 "timestamp" 上获得第一个 top_hit 排序。

    我刚刚建立了一个基本索引并添加了一些看起来像您发布的数据:

    PUT /test_index
    {
        "settings": {
            "number_of_shards": 1
        }
    }
    
    POST /test_index/_bulk
    {"index":{"_index":"test_index","_type":"doc","_id":1}}
    {"timestamp":"2015-03-17T15:05:04.563Z","session_id":"1","user_id":"jan"}
    {"index":{"_index":"test_index","_type":"doc","_id":2}}
    {"timestamp":"2015-03-17T15:10:04.563Z","session_id":"1","user_id":"jan"}
    {"index":{"_index":"test_index","_type":"doc","_id":3}}
    {"timestamp":"2015-03-17T15:15:04.563Z","session_id":"1","user_id":"jan"}
    {"index":{"_index":"test_index","_type":"doc","_id":4}}
    {"timestamp":"2015-03-17T18:05:04.563Z","session_id":"1","user_id":"bob"}
    {"index":{"_index":"test_index","_type":"doc","_id":5}}
    {"timestamp":"2015-03-17T18:10:04.563Z","session_id":"1","user_id":"bob"}
    {"index":{"_index":"test_index","_type":"doc","_id":6}}
    {"timestamp":"2015-03-17T18:15:04.563Z","session_id":"1","user_id":"bob"}
    

    然后我可以得到每个会话的开始时间:

    POST /test_index/_search?search_type=count
    {
       "aggs": {
          "group_by_uid": {
             "terms": {
                "field": "user_id"
             },
             "aggs": {
                "group_by_sid": {
                   "terms": {
                      "field": "session_id"
                   },
                   "aggs": {
                      "session_start": {
                         "top_hits": {
                            "size": 1,
                            "sort": [ { "timestamp": { "order": "asc" } } ]
                         }
                      }
                   }
                }
             }
          }
       }
    }
    ...
    {
       "took": 5,
       "timed_out": false,
       "_shards": {
          "total": 1,
          "successful": 1,
          "failed": 0
       },
       "hits": {
          "total": 6,
          "max_score": 0,
          "hits": []
       },
       "aggregations": {
          "group_by_uid": {
             "buckets": [
                {
                   "key": "bob",
                   "doc_count": 3,
                   "group_by_sid": {
                      "buckets": [
                         {
                            "key": "1",
                            "doc_count": 3,
                            "session_start": {
                               "hits": {
                                  "total": 3,
                                  "max_score": null,
                                  "hits": [
                                     {
                                        "_index": "test_index",
                                        "_type": "doc",
                                        "_id": "4",
                                        "_score": null,
                                        "_source": {
                                           "timestamp": "2015-03-17T18:05:04.563Z",
                                           "session_id": "1",
                                           "user_id": "bob"
                                        },
                                        "sort": [
                                           1426615504563
                                        ]
                                     }
                                  ]
                               }
                            }
                         }
                      ]
                   }
                },
                {
                   "key": "jan",
                   "doc_count": 3,
                   "group_by_sid": {
                      "buckets": [
                         {
                            "key": "1",
                            "doc_count": 3,
                            "session_start": {
                               "hits": {
                                  "total": 3,
                                  "max_score": null,
                                  "hits": [
                                     {
                                        "_index": "test_index",
                                        "_type": "doc",
                                        "_id": "1",
                                        "_score": null,
                                        "_source": {
                                           "timestamp": "2015-03-17T15:05:04.563Z",
                                           "session_id": "1",
                                           "user_id": "jan"
                                        },
                                        "sort": [
                                           1426604704563
                                        ]
                                     }
                                  ]
                               }
                            }
                         }
                      ]
                   }
                }
             ]
          }
       }
    }
    

    结束时间:

    POST /test_index/_search?search_type=count
    {
       "aggs": {
          "group_by_uid": {
             "terms": {
                "field": "user_id"
             },
             "aggs": {
                "group_by_sid": {
                   "terms": {
                      "field": "session_id"
                   },
                   "aggs": {
                      "session_end": {
                         "top_hits": {
                            "size": 1,
                            "sort": [ { "timestamp": { "order": "desc" } } ]
                         }
                      }
                   }
                }
             }
          }
       }
    }
    ...
    {
       "took": 2,
       "timed_out": false,
       "_shards": {
          "total": 1,
          "successful": 1,
          "failed": 0
       },
       "hits": {
          "total": 6,
          "max_score": 0,
          "hits": []
       },
       "aggregations": {
          "group_by_uid": {
             "buckets": [
                {
                   "key": "bob",
                   "doc_count": 3,
                   "group_by_sid": {
                      "buckets": [
                         {
                            "key": "1",
                            "doc_count": 3,
                            "session_end": {
                               "hits": {
                                  "total": 3,
                                  "max_score": null,
                                  "hits": [
                                     {
                                        "_index": "test_index",
                                        "_type": "doc",
                                        "_id": "6",
                                        "_score": null,
                                        "_source": {
                                           "timestamp": "2015-03-17T18:15:04.563Z",
                                           "session_id": "1",
                                           "user_id": "bob"
                                        },
                                        "sort": [
                                           1426616104563
                                        ]
                                     }
                                  ]
                               }
                            }
                         }
                      ]
                   }
                },
                {
                   "key": "jan",
                   "doc_count": 3,
                   "group_by_sid": {
                      "buckets": [
                         {
                            "key": "1",
                            "doc_count": 3,
                            "session_end": {
                               "hits": {
                                  "total": 3,
                                  "max_score": null,
                                  "hits": [
                                     {
                                        "_index": "test_index",
                                        "_type": "doc",
                                        "_id": "3",
                                        "_score": null,
                                        "_source": {
                                           "timestamp": "2015-03-17T15:15:04.563Z",
                                           "session_id": "1",
                                           "user_id": "jan"
                                        },
                                        "sort": [
                                           1426605304563
                                        ]
                                     }
                                  ]
                               }
                            }
                         }
                      ]
                   }
                }
             ]
          }
       }
    }
    

    这是我使用的代码:

    http://sense.qbox.io/gist/05edb48b840e6a992646643913db8ef0a3ccccb3

    【讨论】:

    • 非常感谢。我可以发誓我也试过了,但显然我有问题。还要感谢您提供 Qbox.io 链接。这看起来对测试很有用,因为我正在使用 java API :) 我还不能投票给你的答案......
    • 我刚刚得到另一个问题:会话开始和结束的结果是否总是相同的?即,如果我想将搜索命中合并在一起,请将第一个会话开始搜索命中始终对应于第一个会话结束搜索命中?
    【解决方案2】:

    这是基于 Sloan Ahrens 提出的方法的单一搜索解决方案。优点是开始和结束会话条目在同一个桶中。

    {
    "aggs": {
      "group_by_uid": {
         "terms": {
            "field": "user_id"
         },
         "aggs": {
            "group_by_sid": {
               "terms": {
                  "field": "session_id"
               },
               "aggs": {
                  "session_start": {
                     "top_hits": {
                        "size": 1,
                        "sort": [ { "timestamp": { "order": "asc" } } ]
                     }
                  },
                  "session_end": {
                     "top_hits": {
                        "size": 1,
                        "sort": [ { "timestamp": { "order": "desc" } } ]
                     }
                  }
               }
            }
         }
      }
    }
    }
    

    干杯, 一月

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2016-02-04
      • 1970-01-01
      • 1970-01-01
      • 2012-04-03
      • 1970-01-01
      • 1970-01-01
      • 2019-11-25
      • 1970-01-01
      相关资源
      最近更新 更多