使用聚合的 Elasticsearch 跨索引查询答案

【问题标题】：Elasticsearch cross-index query with aggregations使用聚合的 Elasticsearch 跨索引查询
【发布时间】：2020-09-19 18:56:02
【问题描述】：

我使用：Elasticsearch 7.7、Kibana 7.7

例如，让我们取两个索引：

简单映射的用户索引：

PUT /user_index
{
  "mappings": {
    "properties": {
      "user_id":    { "type": "text" },
      "user_phone":    { "type": "text" },
      "name":   { "type": "text"  }     
    }
  }
}

用简单的映射检查：

PUT /check_index
{
  "mappings": {
    "properties": {
      "user_id":    { "type": "text" },  
      "price":   { "type": "integer"  },
      "goods_count":  {"type": "integer"}
    }
  }
}

我想像这样构建表格可视化：

________________________________________________________________________
  user_id  |   user_phone  | average_price       |    sum_goods_count  |
___________|_______________|_____________________|______________________
     1     |       123     |       512           |         64          |
___________|_______________|_____________________|______________________
     2     |       456     |       256           |         16          | 
___________|_______________|_____________________|______________________

所以我的问题是：

是真的吗？
我是否正确理解我需要查询这两个索引，获取用户列表，然后循环创建带有支票的购物车？

【问题讨论】：

user_index 在获得预期输出方面有什么用？因为您的预期输出包含的字段也存在于check_index 中。
@Bhavya Gupta，这只是一个例子，真正的挑战要复杂得多，我需要建立一个包含两个索引字段的表。我希望有人能给我一个关于如何实现它或如何解决这个问题的想法。

标签： elasticsearch elasticsearch-7 kibana-7

【解决方案1】：

首先，您应该尽可能尝试de-normalize ES 中的数据以获得它提供的最佳性能和能力，并且我在问题中查看了您和 cmets 提供的示例，似乎通过将user 和check 索引组合成单个索引，可以在您的用例中轻松实现，如下例所示。

索引映射

{
    "mappings": {
        "properties": {
            "user_id": {
                "type": "text",
                "fielddata": "true"
            },
            "price": {
                "type": "integer"
            },
            "goods_count": {
                "type": "integer"
            }
        }
    }
}

索引数据：

使用上面定义的索引映射，索引这三个文档，其中一个文档具有 "user_id":"1"，2 个文档具有 "user_id":"2"

{
    "user_id":"1",
    "price":500,
    "goods_count":100
}
{
    "user_id":"2",
    "price":500,
    "goods_count":100
}
{
    "user_id":"2",
    "price":100,
    "goods_count":200
}

搜索查询：

详细解释请参考ES官方文档Terms Aggregation、Top Hits aggregation、Sum aggregation和Avg aggregation。

{
  "size": 0,
  "aggs": {
    "user": {
      "terms": {
        "field": "user_id"
      },
      "aggs": {
        "top_user_hits": {
          "top_hits": {
            "_source": {
              "includes": [
                "user_id"
              ]
            }
          }
        },
        "avg_price": {
          "avg": {
            "field": "price"
          }
        },
        "goods_count": {
          "sum": {
            "field": "goods_count"
          }
        }
      }
    }
  }
}

搜索结果：

{
  "took": 10,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": null,
    "hits": [
      
    ]
  },
  "aggregations": {
    "user": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "2",
          "doc_count": 2,
          "top_user_hits": {
            "hits": {
              "total": {
                "value": 2,
                "relation": "eq"
              },
              "max_score": 1.0,
              "hits": [
                {
                  "_index": "stof_63925596",
                  "_type": "_doc",
                  "_id": "2",
                  "_score": 1.0,
                  "_source": {
                    "user_id": "2"
                  }
                },
                {
                  "_index": "stof_63925596",
                  "_type": "_doc",
                  "_id": "3",
                  "_score": 1.0,
                  "_source": {
                    "user_id": "2"
                  }
                }
              ]
            }
          },
          "avg_price": {
            "value": 300.0
          },
          "goods_count": {
            "value": 300.0
          }
        },
        {
          "key": "1",
          "doc_count": 1,
          "top_user_hits": {
            "hits": {
              "total": {
                "value": 1,
                "relation": "eq"
              },
              "max_score": 1.0,
              "hits": [
                {
                  "_index": "stof_63925596",
                  "_type": "_doc",
                  "_id": "1",
                  "_score": 1.0,
                  "_source": {
                    "user_id": "1"
                  }
                }
              ]
            }
          },
          "avg_price": {
            "value": 500.0
          },
          "goods_count": {
            "value": 100.0
          }
        }
      ]
    }
  }
}

如您在上面的搜索结果中所见，"user_id":"2" 的平均价格为(500+100)/2 = 300，goods_count 的总和为100+200 = 300。

同样，"user_id":"1" 的平均价格为 500/1 = 500，goods_count 的总和为 100。

【讨论】：

Elasticsearch，谢谢您的回答。我理解这个概念，但在真正的问题中，我将不得不组合几个大索引，使用嵌套对象，其数量可以达到数千个，并在 kibana 中构建具有 10-15 列的表。你能告诉我elasticsearch+kibana是否适合这个目的，还是我应该考虑另一种解决方案？
@АртемСавельев，感谢您更详细地解释您的用例，但我仍然不知道您的应用程序的完整用例，但它似乎是错误的起初有限的视野和信息，由于 ES 中的聚合和嵌套文档都非常昂贵，并且围绕它们有几篇文章，关于聚合请参阅我在 SO stackoverflow.com/a/63965634/4039431 上的最新答案，对于嵌套请参阅 Gojek 的中型博客 blog.gojekengineering.com/… 以了解他们的性能问题。
@АртемСавельев，我建议你问一个关于功能性和非功能性需求设计的新问题，社区可以更好地帮助你。如果它有帮助，请不要忘记投票并接受我的回答，因为这让我有动力帮助其他人:)