【问题标题】:elastic search - using a nested filtered array as bucket弹性搜索 - 使用嵌套过滤数组作为存储桶
【发布时间】:2015-09-09 15:49:05
【问题描述】:

我有点迷茫……

考虑一下这个简单的索引文档:

{
"url" : "http://...?mypage"
"pages": [
 {
  "elapsed": 1190,
  "type": "LOADPAGE"
 },
 {
  "elapsed": 115400,
  "type": "ONPAGE"
 },
 {
  "elapsed": 1100,
  "type": "LOADPAGE"
 },
 {
  "elapsed": 1340,
  "type": "ONPAGE"
 }
]    
}

我正在尝试计算平均 LOADPAGE,所以我知道我需要“avg”或“stats”聚合。

"aggs": {
    "compute_loadpage": {
        "filter": { "term": { "pages.type": "loadpage" } },
        "aggs": {
            "loadpage_all": {
                "stats": {
                    "field": "pages.elapsed"
                }
            }
       }
    }
}

我知道“过滤器” agg 将创建一个包含与我的过滤器对应的所有文档的存储桶,那么我的 agg 将在我的完整“页面”数组上完成是可以理解的。

我怎样才能创建一个只有 LOADPAGE 值的存储桶,然后我就可以对其进行聚合,或者我必须使用脚本聚合吗?

【问题讨论】:

    标签: arrays elasticsearch filtering aggregation bucket


    【解决方案1】:

    只要您的文档映射使用nested type,您就可以使用nested aggregation

    为了测试,我设置了一个像这样的简单索引(注意嵌套类型,以及"pages.type" 上的"index": "not_analyzed"):

    PUT /test_index
    {
       "settings": {
          "number_of_shards": 1
       },
       "mappings": {
          "doc": {
             "properties": {
                "pages": {
                   "type": "nested",
                   "properties": {
                      "elapsed": {
                         "type": "long"
                      },
                      "type": {
                         "type": "string",
                         "index": "not_analyzed"
                      }
                   }
                },
                "url": {
                   "type": "string"
                }
             }
          }
       }
    }
    

    然后我索引了你的文档:

    PUT /test_index/doc/1
    {
       "url": "http://...?mypage",
       "pages": [
          {
             "elapsed": 1190,
             "type": "LOADPAGE"
          },
          {
             "elapsed": 115400,
             "type": "ONPAGE"
          },
          {
             "elapsed": 1100,
             "type": "LOADPAGE"
          },
          {
             "elapsed": 1340,
             "type": "ONPAGE"
          }
       ]
    }
    

    那么这个聚合似乎提供了你想要的:

    POST /test_index/_search?search_type=count
    {
       "aggs": {
          "pages_nested": {
             "nested": {
                "path": "pages"
             },
             "aggs": {
                "loadpage_filtered": {
                   "filter": {
                      "term": {
                         "pages.type": "LOADPAGE"
                      }
                   },
                   "aggs": {
                      "loadpage_avg_elap": {
                         "avg": {
                            "field": "pages.elapsed"
                         }
                      }
                   }
                }
             }
          }
       }
    }
    ...
    {
       "took": 3,
       "timed_out": false,
       "_shards": {
          "total": 1,
          "successful": 1,
          "failed": 0
       },
       "hits": {
          "total": 1,
          "max_score": 0,
          "hits": []
       },
       "aggregations": {
          "pages_nested": {
             "doc_count": 4,
             "loadpage_filtered": {
                "doc_count": 2,
                "loadpage_avg_elap": {
                   "value": 1145,
                   "value_as_string": "1145.0"
                }
             }
          }
       }
    }
    

    这是我用来测试的代码:

    http://sense.qbox.io/gist/b526427f14225b02e7268ed15d8c6dde4793fc8d

    【讨论】:

      猜你喜欢
      • 2020-02-07
      • 1970-01-01
      • 1970-01-01
      • 2020-12-27
      • 2020-10-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多