Elasticsearch 不同的过滤器值答案

【问题标题】：Elasticsearch distinct filter valuesElasticsearch 不同的过滤器值
【发布时间】：2017-09-20 18:53:58
【问题描述】：

我在 elasticsearch 中有一个大型文档存储，并希望检索不同的过滤器值以显示在 HTML 下拉列表中。

一个例子就像

[ { “名称”：“约翰·多伊”， “部门”：[ { “名称”：“帐户” }, { “名称”：“管理” } ] }, { “名称”：“简·史密斯”， “部门”：[ { “名称”：“它” }, { “名称”：“管理” } ] } ]

下拉列表应包含部门列表，即 IT、客户和管理。

请好心人指点我从 elasticsearch 检索不同部门列表的正确方向吗？

谢谢

【问题讨论】：

标签： elasticsearch filter distinct nosql

【解决方案1】：

这是 terms 聚合 (documentation) 的作业。

您可以像这样拥有不同的 departments 值：

POST company/employee/_search
{
  "size":0,
  "aggs": {
    "by_departments": {
      "terms": {
        "field": "departments.name",
        "size": 0 //see note 1
      }
    }
  }
}

在您的示例中，输出：

{
   ...
   "aggregations": {
      "by_departments": {
         "buckets": [
            {
               "key": "management", //see note 2
               "doc_count": 2
            },
            {
               "key": "accounts",
               "doc_count": 1
            },
            {
               "key": "it",
               "doc_count": 1
            }
         ]
      }
   }
}

两个附加说明：

将size 设置为0 会将最大桶数设置为Integer.MAX_VALUE。如果有太多 departments 不同的值，请不要使用它。
您可以看到键是terms 分析departments 值的结果。请务必在映射为 not_analyzed 的字段上使用您的 terms 聚合。

例如，使用我们的默认映射（departments.name 是 analyzed 字符串），添加此员工：

{
  "name": "Bill Gates",
  "departments": [
    {
      "name": "IT"
    },
    {
      "name": "Human Resource"
    }
  ]
}

会导致这样的结果：

{
   ...
   "aggregations": {
      "by_departments": {
         "buckets": [
            {
               "key": "it",
               "doc_count": 2
            },
            {
               "key": "management",
               "doc_count": 2
            },
            {
               "key": "accounts",
               "doc_count": 1
            },
            {
               "key": "human",
               "doc_count": 1
            },
            {
               "key": "resource",
               "doc_count": 1
            }
         ]
      }
   }
}

使用正确的映射：

POST company
{
  "mappings": {
    "employee": {
      "properties": {
        "name": {
          "type": "string"
        },
        "departments": {
          "type": "object",
          "properties": {
            "name": {
              "type": "string",
              "index": "not_analyzed"
            }
          }
        }
      }
    }
  }
}

同样的请求最终输出：

{
   ...
   "aggregations": {
      "by_departments": {
         "buckets": [
            {
               "key": "IT",
               "doc_count": 2
            },
            {
               "key": "Management",
               "doc_count": 2
            },
            {
               "key": "Accounts",
               "doc_count": 1
            },
            {
               "key": "Human Resource",
               "doc_count": 1
            }
         ]
      }
   }
}

希望这会有所帮助！

【讨论】：

完美答案！谢谢汤姆。