【问题标题】:Query the latest document of each type on Elasticsearch在 Elasticsearch 上查询每种类型的最新文档
【发布时间】:2015-07-28 10:51:25
【问题描述】:

我正在尝试在 Elasticsearch 上运行看似简单的查询,但我似乎无法获得我正在寻找的结果。

这是我正在尝试做的一个简短示例:

我有一个新闻数据库。每条新闻都包含一个来源、一个标题、一个时间戳和一个用户。

我想要获取给定用户的每个可用来源的最后一个(基于时间戳的)标题。

#!/bin/bash

export ELASTICSEARCH_ENDPOINT="http://localhost:9200"

# Create indexes

curl -XPUT "$ELASTICSEARCH_ENDPOINT/news" -d '{
    "mappings": {
        "news": {
            "properties": {
                "source": { "type": "string", "index": "not_analyzed" },
                "headline": { "type": "object" },
                "timestamp": { "type": "date", "format": "date_hour_minute_second_millis" },
                "user": { "type": "string", "index": "not_analyzed" }
            }
        }
    }
}'

# Index documents
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '
{"index":{"_index":"news","_type":"news"}}
{"user": "John", "source": "CNN", "headline": "Great news", "timestamp": "2015-07-28T00:07:29.000"}
{"index":{"_index":"news","_type":"news"}}
{"user": "John", "source": "CNN", "headline": "More great news", "timestamp": "2015-07-28T00:08:23.000"}
{"index":{"_index":"news","_type":"news"}}
{"user": "John", "source": "ESPN", "headline": "Sports news", "timestamp": "2015-07-28T00:09:32.000"}
{"index":{"_index":"news","_type":"news"}}
{"user": "John", "source": "ESPN", "headline": "More sports news", "timestamp": "2015-07-28T00:10:35.000"}
{"index":{"_index":"news","_type":"news"}}
{"user": "Mary", "source": "Yahoo", "headline": "More news", "timestamp": "2015-07-28T00:11:54.000"}
{"index":{"_index":"news","_type":"news"}}
{"user": "Mary", "source": "Yahoo", "headline": "Crazy news", "timestamp": "2015-07-28T00:12:31.000"}
'

那么,例如,我如何从约翰那里获得最后一个 CNN 和最后一个 ESPN 头条新闻?

我一直在研究多搜索 API,但这意味着我需要事先了解所有来源(在本例中为 CNN 和 ESPN)。

【问题讨论】:

    标签: api elasticsearch timestamp


    【解决方案1】:

    首先,请注意,我必须将 headline 字段的映射更改为 string,因为在您的示例文档中,标题是 strings 而不是 objects。

    因此,类似以下的查询会检索到您期望的结果:

    curl -XPOST "$ELASTICSEARCH_ENDPOINT/news/_search" -d '{
      "size": 0,
      "query": {
        "filtered": {
          "filter": {
            "term": {
              "user": "John"           <--- filter for user=John
            }
          }
        }
      },
      "aggs": {
        "sources": {
          "terms": {
            "field": "source"          <--- aggregate by source
          },
          "aggs": {
            "latest": {
              "top_hits": {
                "size": 1,             <--- only take the first...
                "_source": [           <--- only the date and headline
                   "headline",
                   "timestamp"
                ],
                "sort": {
                  "timestamp": "desc"  <--- ...and only the latest hit
                }
              }
            }
          }
        }
      }
    }'
    

    这将产生如下内容:

    {
      ...
      "aggregations" : {
        "sources" : {
          "doc_count_error_upper_bound" : 0,
          "sum_other_doc_count" : 0,
          "buckets" : [ {
            "key" : "CNN",
            "doc_count" : 2,
            "latest" : {
              "hits" : {
                "total" : 2,
                "max_score" : null,
                "hits" : [ {
                  "_index" : "news",
                  "_type" : "news",
                  "_id" : "AU7Sh3VDGDddn2ZNuDVl",
                  "_score" : null,
                  "_source":{
                      "headline": "More great news", 
                      "timestamp": "2015-07-28T00:08:23.000"
                  },
                  "sort" : [ 1438042103000 ]
                } ]
              }
            }
          }, {
            "key" : "ESPN",
            "doc_count" : 2,
            "latest" : {
              "hits" : {
                "total" : 2,
                "max_score" : null,
                "hits" : [ {
                  "_index" : "news",
                  "_type" : "news",
                  "_id" : "AU7Sh3VDGDddn2ZNuDVn",
                  "_score" : null,
                  "_source":{
                       "headline": "More sports news", 
                       "timestamp": "2015-07-28T00:10:35.000"
                  },
                  "sort" : [ 1438042235000 ]
                } ]
              }
            }
          } ]
        }
      }
    }
    

    【讨论】:

    • 非常感谢!这正是我想要的!我不知道你可以使用聚合来提取实际的 _source,虽然它只是为了统计。
    • @Val 我有一个与上面类似的问题,但我希望在此处实现的基础上再增加一层聚合。期待您的回复。 stackoverflow.com/questions/38195420/…
    • 是否可以按相同的方式按多个字段进行聚合?
    • @StoyanDekov“按多个字段聚合”可能意味着不同的含义,因此我建议您创建一个新问题来引用这个问题并清楚地解释您的需求,从而增加获得答案的机会。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2017-02-20
    • 1970-01-01
    • 1970-01-01
    • 2011-09-01
    相关资源
    最近更新 更多