【问题标题】:Elasticsearch unique documents after querying with match-phrase使用匹配短语查询后的 Elasticsearch 唯一文档
【发布时间】:2021-11-04 15:46:10
【问题描述】:

嘿堆栈溢出我有一个如下所示的弹性搜索文档。我只对“标签”键感兴趣。

 "_index": "graph_20211025t0909",
                "_type": "_doc",
                "_id": "E12201A5-CC50-40AF-97AE-C54A2CA303F7",
                "_score": null,
                "_source": {
                    "entity_id": "E12201A5-CC50-40AF-97AE-C54A2CA303F7",
                    "properties": {
                        "external": {
                            "facebook": {
                                "id": "muji.jp"
                            },
                            "instagram": {
                                "id": "muji_global"
                            },
                            "twitter": {
                                "id": "muji_net"
                            },
                            "wikidata": {
                                "id": "Q708789"
                            }
                        },
                        "akas": [
                            {
                                "value": "Muji",
                                "language": "zh"
                            },
                            {
                                "value": "multinacional japonesa",
                                "language": "es"
                            },
                        ]
                    },
                    "data_source": {
                        "data_pull_date": "202109",
                        "source_id": "muji_global",
                        "dataset": "brand"
                    },
                    "scoring_entity_data_size": 5306,
                    "population_percentile": 0.9855572298745676,
                    "type_synonyms": [],
                    "@version": "1",
                    "@timestamp": "2021-10-25T16:28:24.892Z",
                    "name": "Muji",
                    "types": [
                        "urn:entity:brand"
                    ],
                    "tags": [
                        {
                            "tag_id": "D24DE9CF-C778-4468-8433-5A0E8AA2BA9D",
                            "name": "Wikipedia articles with GND identifiers",
                            "type": "urn:tag:wikipedia_category"
                        },
                        {
                            "tag_id": "67A608CC-2DA3-4C78-B7F6-6DD419744FFC",
                            "name": "Clothing brands of Japan",
                            "type": "urn:tag:wikipedia_category"
                        },
]
}

我的弹性搜索查询是

{
    "size": 20,
    "_source": ["tags"],
    "sort": [
        { "@timestamp": { "order": "desc" } }
    ],
    "query": {
        "nested" : {
            "path" : "tags",
                "query" : {
                    "bool" : {
                        "must" : [
                          { "match_phrase" : {"tags.name" : "thriller"} }
                        ]    
                }
            }
        }
    }
}

我的问题是我的查询如何根据我的 Elasticsearch 查询返回 unique 文档?我正在“tags”字段中搜索“tags.name”。我希望我的“标签”字段返回一组独特的项目,例如我目前正在返回

tags: [
{
                        {
                            "name": "Male actors",
                            "tag_id": "A2A18D57-24B5-4578-B0D3-2A9190EEAD7C",
                            "type": "urn:tag:wikipedia_category"
                        },
                        {
                            "name": "some tag name",
                            "tag_id": "0CB4BE42-026F-4B14-A59A-C5A331E8A56F",
                            "type": "urn:tag:wikipedia_category"
                        },
    },
                        {
                            "name": "Male actors",
                            "tag_id": "A2A18D57-24B5-4578-B0D3-2A9190EEAD7C",
                            "type": "urn:tag:wikipedia_category"
                        },
                        {
                            "name": "another tag name",
                            "tag_id": "0CB4BE42-026F-4B14-A59A-C5A331E8A56F",
                            "type": "urn:tag:wikipedia_category"
                        },
}

]

我希望我的结果不重复“name”:“男演员”

【问题讨论】:

    标签: elasticsearch


    【解决方案1】:

    您的查询返回的tags 来自不同的文档,因此您不能假设它们是唯一的。我的建议是使用聚合来获得唯一的tags.name

    {
        "size": 20,
        "_source": ["tags"],
        "sort": [
            { "@timestamp": { "order": "desc" } }
        ],
        "query": {
            "nested" : {
                "path" : "tags",
                    "query" : {
                        "bool" : {
                            "must" : [
                              { "match_phrase" : {"tags.name" : "thriller"} }
                            ]    
                    }
                }
            }
        },
       "aggs": {
         "unique_tags": {
           "nested": {
             "path": "tags"
           },
           "aggs": {
             "tag_name": {
               "terms": {
                  "field": "tags.name"
               }
             }
           }
        }
    }
    

    【讨论】:

    • 鉴于我上面的查询,我试图使返回的文档唯一。我将如何使用上面列出的当前查询来执行此操作?
    • 您只需将aggs 部分添加到您的查询中。我将编辑我的答案
    • 感谢您这样做,但我只是使用了查询,结果出现错误。 Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [tags.name] in order to load field data by uninverting the inverted index. Note that this can use significant memory.
    • 我认为这是因为字段tag.name 不是keyword 类型。如果您可以更新您的映射以包含tag.namekeyword 字段,即tag.name.raw,它会起作用。
    猜你喜欢
    • 2021-10-16
    • 1970-01-01
    • 1970-01-01
    • 2019-03-19
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2023-04-10
    • 1970-01-01
    相关资源
    最近更新 更多