具有混合嵌套/非嵌套过滤器的嵌套对象聚合项答案

【问题标题】：Nested object aggregation term with mixed nested/non-nested filter具有混合嵌套/非嵌套过滤器的嵌套对象聚合项
【发布时间】：2019-08-30 19:57:37
【问题描述】：

我们有分面显示点击过滤器（并组合它们）时将显示的结果数量。像这样的：

在我们引入嵌套对象之前，以下内容可以完成这项工作：

GET /x_v1/_search/
{
  "size": 0,
  "aggs": {
    "FilteredDescriptiveFeatures": {
      "filter": {
        "bool": {
          "must": [
            {
              "terms": {
                "breadcrumbs.categoryIds": [
                  "category"
                ]
              }
            },
            {
              "terms": {
                "products.sterile": [
                  "0"
                ]
              }
            }
          ]
        }
      },
      "aggs": {
        "DescriptiveFeatures": {
          "terms": {
            "field": "products.descriptiveFeatures",
            "size": 1000
          }
        }
      }
    }
  }
}

这给出了结果：

  "aggregations": {
    "FilteredDescriptiveFeatures": {
      "doc_count": 280,
      "DescriptiveFeatures": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
          {
            "key": "somekey",
            "doc_count": 42
          },

虽然我们需要将products 设为嵌套对象，但我目前正在尝试重写上述内容以处理此更改。我的尝试如下所示。但它没有给出正确的结果，并且似乎没有正确连接到过滤器。

GET /x_v2/_search/
{
  "size": 0,
  "aggs": {
    "FilteredDescriptiveFeatures": {
      "filter": {
        "bool": {
          "must": [
            {
              "terms": {
                "breadcrumbs.categoryIds": [
                  "category"
                ]
              }
            },
            {
              "nested": {
                "path": "products",
                "query": {
                  "terms": {
                    "products.sterile": [
                      "0"
                    ]
                  }
                }
              }
            }
          ]
        }
      },
      "aggs": {
        "nested": {
          "nested": {
            "path": "products"
          },
          "aggregations": {
            "DescriptiveFeatures": {
              "terms": {
                "field": "products.descriptiveFeatures",
                "size": 1000
              }
            }
          }
        }
      }
    }
  }
}

这给出了结果：

  "aggregations": {
    "FilteredDescriptiveFeatures": {
      "doc_count": 280,
      "nested": {
        "doc_count": 1437,
        "DescriptiveFeatures": {
          "doc_count_error_upper_bound": 0,
          "sum_other_doc_count": 0,
          "buckets": [
            {
              "key": "somekey",
              "doc_count": 164
            },

我还尝试将嵌套定义放在更高的位置以同时包含过滤器和 aggs，但是不在嵌套对象中的过滤器术语 breadcrumbs.categoryId 将不起作用。

我正在尝试做的事情是否可能？又该如何解决？

【问题讨论】：

标签： elasticsearch elasticsearch-aggregation

【解决方案1】：

在您的FilteredDescriptiveFeatures 步骤中，您返回所有具有sterile = 0 的产品的所有文档

但在nested step 之后，您不再指定此过滤器。因此，所有嵌套产品都在此步骤中返回，因此您可以对所有产品进行术语聚合，而不仅仅是带有 sterile = 0 的产品

您应该在嵌套步骤中移动无菌过滤器。就像 Richa 指出的那样，您需要在最后一步中使用 reverse_nested 聚合来计算弹性搜索文档而不是嵌套产品子文档。

你能试试这个查询吗？

{
    "size": 0,
    "aggs": {
        "filteredCategory": {
            "filter": {
                "terms": {
                    "breadcrumbs.categoryIds": [
                        "category"
                    ]
                }
            },
            "aggs": {
                "nestedProducts": {
                    "nested": {
                        "path": "products"
                    },
                    "aggs": {
                        "filteredByProductsAttributes": {
                            "filter": {
                                "terms": {
                                    "products.sterile": [
                                        "0"
                                    ]
                                }
                            },
                            "aggs": {
                                "DescriptiveFeatures": {
                                    "terms": {
                                        "field": "products.descriptiveFeatures",
                                        "size": 1000
                                    },
                                    "aggs": {
                                        "productCount": {
                                            "reverse_nested": {}
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

【讨论】：

【解决方案2】：

我从描述中了解到的是，您希望根据一些嵌套和非嵌套字段过滤结果，然后在嵌套字段上应用聚合。我创建了一个包含一些嵌套和非嵌套字段的示例索引和数据并创建了一个查询

映射

    PUT stack-557722203
    {
      "mappings": {
        "_doc": {
          "properties": {
            "category": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "user": {
              "type": "nested",       // NESTED FIELD
              "properties": {
                "fName": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                },
                "lName": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                },
                "type": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                }
              }
            }
          }
        }
      }
    }

样本数据

    POST _bulk
    {"index":{"_index":"stack-557722203","_id":"1","_type":"_doc"}}
    {"category":"X","user":[{"fName":"A","lName":"B","type":"X"},{"fName":"A","lName":"C","type":"X"},{"fName":"P","lName":"B","type":"Y"}]}
    {"index":{"_index":"stack-557722203","_id":"2","_type":"_doc"}}
    {"category":"X","user":[{"fName":"P","lName":"C","type":"Z"}]}
    {"index":{"_index":"stack-557722203","_id":"3","_type":"_doc"}}
    {"category":"X","user":[{"fName":"A","lName":"C","type":"Y"}]}
    {"index":{"_index":"stack-557722203","_id":"4","_type":"_doc"}}
    {"category":"Y","user":[{"fName":"A","lName":"C","type":"Y"}]}

查询

GET stack-557722203/_search
{
   "size": 0, 
   "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "path": "user",
            "query": {
              "term": {
                "user.fName.keyword": {
                  "value": "A"
                }
              }
            }
          }
        },
        {
          "term": {
            "category.keyword": {
              "value": "X"
            }
          }
        }
      ]
    }
  },

  "aggs": {
    "group BylName": {
      "nested": {
        "path": "user"
      },
      "aggs": {
        "group By lName": {
         "terms": {
           "field": "user.lName.keyword",
           "size": 10
         },
         "aggs": {
           "reverse Nested": {
             "reverse_nested": {}    // NOTE THIS
           }
         }
        }
      }
    }
  }
}

输出

{
  "took": 18,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group BylName": {
      "doc_count": 4,
      "group By lName": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
          {
            "key": "B",
            "doc_count": 2,
            "reverse Nested": {
              "doc_count": 1
            }
          },
          {
            "key": "C",
            "doc_count": 2,
            "reverse Nested": {
              "doc_count": 2
            }
          }
        ]
      }
    }
  }
}

根据您获取的数据的差异，当您将映射更改为 Nested 时，doc_count 中的更多文档是因为 Nested 和 Object(NonNested) 文档的存储方式。请参阅here 以了解它们是如何在内部存储的。为了将它们连接回根 Document ，您可以使用 Reverse Nested 聚合，然后您将获得相同的结果。

希望这会有所帮助！

【讨论】：

我认为问题不仅在于它没有使用反向嵌套来获得准确的文档计数。这也是过滤器无菌= 0放错了位置。它应该在嵌套聚合下的过滤聚合中使用。
@PierreMallet 我认为这是不可能的，因为如果您在顶部有嵌套聚合，您将无法在其中使用非嵌套字段。这就是为什么我会在聚合之上使用嵌套查询，这样您还可以获得优势，即您只会获得过滤后的文档，然后聚合仅适用于此。如有混淆请指正
如果您不在嵌套聚合中添加过滤器，则聚合将应用于主文档的所有嵌套文档。 @Richa 你能在下面检查我的答案吗，我没有时间检查它，但它应该可以工作。您可以有多个过滤步骤，每个嵌套级别一个。并且它结束了您突出显示的文档计数的 reverse_nested
不幸的是，这并没有给出正确的结果（Pierre Mallet 的答案确实如此），但感谢您的指点，特别是您向我介绍了反向嵌套聚合。