桶术语聚合 Elasticsearch答案

【问题标题】：bucket Terms aggregation Elasticsearch桶术语聚合 Elasticsearch
【发布时间】：2021-06-06 12:20:14
【问题描述】：

elasticsearch 版本

{
  "name" : "abc-Inspiron-5521",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "2vLvphpURJOtfAZSGDDX5w",
  "version" : {
    "number" : "7.10.2",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "747e1cc71def077253878a59143c1f785afa92b9",
    "build_date" : "2021-01-13T00:42:12.435326Z",
    "build_snapshot" : false,
    "lucene_version" : "8.7.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

文档映射

"user_data" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
         "experience" : {
          "properties" : {
            "brand" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "brand_segment" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "company" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "duration" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "property_type" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "real_estate_type" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            }
          }
        }
      }
    }

文档结构正确，如果括号不匹配请相应修改。

文档样本

{
        "_index" : "user_data",
        "_type" : "_doc",
        "_id" : "dONuEXgBU9vYaZRqY8Jo",
        "_score" : 1.0,
        "_source" : {
          "experience" : [
            {
              "brand" : "Hilton",
              "company" : "Hilton LLC",
              "brand_segment" : "Luxury",
              "property_type" : "All-Inclusive",
              "duration" : "2 years",
              "real_estate_type" : "Institutional"
            },
            {
              "brand" : "Mantis",
              "company" : "Accor LLC",
              "brand_segment" : "Upper-Upscale",
              "property_type" : "Condo",
              "duration" : "2 years",
              "real_estate_type" : "Family Office"
            },
            {
              "brand" : "Marriott",
              "company" : "Marriott LLC",
              "brand_segment" : "Independent",
              "property_type" : "Convention",
              "duration" : "2 years",
              "real_estate_type" : "Family Office"
            }
          ]
        }
}

我对brand_segment 的术语聚合查询

GET user_data/_search
{
  "aggs": {
    
      "experience": {
        "terms": { "field": "experience.brand_segment" }
      }
    }
}

现在我在进行术语聚合时遇到了两个问题

在“brand_segment”上执行术语聚合时，假设“Upper-Upscale”的值被视为单个单元，并且要根据其进行计数，但目前我将其理解为：
第二个问题是如果我想计算brand_segment 值是“Luxury”或任何值的次数，但目前从上面的查询中我得到的是出现Luxury 的文档数，而不是所有文档中的次数奢华出现。（到目前为止，对于 1 个文档，多次出现被计为单个）。

错误的结果

"aggregations" : {
    "experience" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "independent",
          "doc_count" : 15
        },
        {
          "key" : "luxury",
          "doc_count" : 15
        },
        {
          "key" : "upper",
          "doc_count" : 14
        },
        {
          "key" : "upscale",
          "doc_count" : 14
        }
      ]
    }
  }

所需的输出应将 Upper-Upscale 作为一个值。我已经获取了多个样本文件，因此得到了这个结果。

请随意将其用作创建索引的示例文档

{
  "id": 1,
  "name": "abcs",
  "source": "csv_status",
  "profile_complition": "70%",
  "creation_date": "2020-04-02",
  "current_position": [
    {
      "position": "Financial Reporting",
      "position_category": "Finance",
      "position_level": 2
    }
  ],
  "seeking_position": [
    {
      "position": "Financial Planning and Analysis",
      "position_category": "Finance",
      "position_level": 3
    }
  ],
  "last_updation_date": "2021-02-02",
  "experience": [
    {
      "brand": "Hilton",
      "company": "Hilton LLC",
      "brand_segment": "Luxury",
      "property_type": "All-Inclusive",
      "duration": "2 years",
      "real_estate_type": "Institutional"
    },
    {
      "brand": "Accor",
      "company": "Accor LLC",
      "brand_segment": "Luxury",
      "property_type": "Condo",
      "duration": "2 years",
      "real_estate_type": "Family Office"
    },
    {
      "brand": "Marriott",
      "company": "Marriott LLC",
      "brand_segment": "Independent",
      "property_type": "Convention",
      "duration": "2 years",
      "real_estate_type": "Family Office"
    }
  ]
}

brand_segment 中的其他出现 = ['Economy', 'Upscale', 'Midscale', 'Upper-Upscale', 'Luxury', 'Independent', 'Extended Stay']

PS：所有品牌细分都需要被视为单个实体（'Upper-Upscale' 不希望作为'Upper'、'Upscale'。'Extended Stay' 也是如此）

如果需要进一步说明，请告诉我。

【问题讨论】：

标签： elasticsearch kibana elasticsearch-aggregation

【解决方案1】：

对于第一个问题，您需要在 keyword 子字段上进行聚合：

GET user_data/_search
{
  "aggs": {
    
      "experience": {
        "terms": { "field": "experience.brand_segment.keyword" }
      }
    }
}

要解决第二个问题，您需要将experience 字段设为nested，这意味着您的映射需要如下所示：

"user_data" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
         "experience" : {
          "type": "nested",                 <--- add this
          "properties" : {
            "brand" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },

【讨论】：

嗨，谢谢，您的建议对我有用。你能帮助我如何使已经索引的文档嵌套。或者直接从 kibana 索引时嵌套的上述文档。目前我正在使用“POST /user_data1/_doc/”，然后是我上面提到的示例文档来索引数据。正如你所知道的，我是 es 的新手。
您无法更改已编入索引的文档，但可以通过适当的映射重新索引它们
你能用上面的例子解释一下吗？
我知道要添加嵌套类型，我关心的是如何添加。我尝试使用以下代码行修改它- PUT user_data/_mapping/ { "properties": { "experience": { "type": "nested" } } } 但它显示错误的请求，错误代码 400。你能告诉我吗知道如何在索引文档之前和之后将其嵌套。
正如我所说，创建索引后，您无法将映射修改为嵌套。您需要创建一个新索引并使用新映射将您的数据重新索引到该新索引中