如何在基于字段的弹性搜索中获取唯一文档，并根据其他字段对结果进行“分组”答案

【问题标题】：How to get unique documents in elastic search based on a field, and 'group by' the result based on other fields如何在基于字段的弹性搜索中获取唯一文档，并根据其他字段对结果进行“分组”
【发布时间】：2020-04-20 00:38:28
【问题描述】：

我刚开始使用弹性搜索，需要解决一个对我来说太复杂的问题。我在索引中有数千个文档，我必须从中查询预定义数量的文档（也可以是几千个），我必须从基于另一个字段的唯一文档中找到基于某些字段的文档组（唯一文档的数量最多可达几百个）。

我的索引中的文档如下所示：

{  
 "complexProperty1" : {
            "A" : "example",
            "B" : "1",
            "D" : true,
            "E" : "case",
            "F" : ["guide1","guide2"]
},
   "complexProperty2" : {
            "X" : "10",
            "Y" : ["specimen1","specimen2"],
            "Z" : "blueprint"
}
}

许多文档将complexProperty1.A 作为“示例”。我想将它们包含一次，生成的文档需要按complexProperty1.D 和complexProperty1.E 分组，即对于每对complexProperty1.D 和complexProperty1.E，我有一个文档列表（我只需要这些文档在我的结果）。我正在使用 Nest 来实现这一点。

【问题讨论】：

标签： elasticsearch nest

【解决方案1】：

您可以从一堆原始的 terms 聚合开始，然后构建回到 NEST DSL 的方法：

POST complexities/_doc
{
  "complexProperty1": {
    "A": "example",
    "B": "1",
    "D": true,
    "E": "case",
    "F": [
      "guide1",
      "guide2"
    ]
  },
  "complexProperty2": {
    "X": "10",
    "Y": [
      "specimen1",
      "specimen2"
    ],
    "Z": "blueprint"
  }
}

GET complexities/_search
{
  "size": 0,
  "aggs": {
    "by_A": {
      "terms": {
        "field": "complexProperty1.A.keyword"
      },
      "aggs": {
        "by_D": {
          "terms": {
            "field": "complexProperty1.D"
          }
        },
        "by_E": {
          "terms": {
            "field": "complexProperty1.E.keyword"
          }
        }
      }
    }
  }
}

为了获得基础文档，您可以将 top_hits agg 附加到每个子 agg，包括。顶层：

{
  "size": 0,
  "aggs": {
    "by_A": {
      "terms": {
        "field": "complexProperty1.A.keyword"
      },
      "aggs": {
        "top_hits_only": {
          "top_hits": {
            "_source": "*"
          }
        },
        "by_D": {
          "terms": {
            "field": "complexProperty1.D"
          },
          "aggs": {
            "top_hits_only": {
              "top_hits": {
                "_source": "*"
              }
            }
          }
        },
        "by_E": {
          "terms": {
            "field": "complexProperty1.E.keyword"
          },
          "aggs": {
            "top_hits_only": {
              "top_hits": {
                "_source": "*"
              }
            }
          }
        }
      }
    }
  }
}

【讨论】：

感谢！顺便说一句，我需要将by_D 和by_E 分组，这是我通过by_A 获得的所有唯一文档。我不需要所有聚合中的文档。另外，我有一个预定义的文档大小，我需要应用它，我可以设置那个大小而不是 0 吗？
你能用你正在寻找的答案更新你的问题吗？