【问题标题】:Aggregate to Get totals of key value pairs in array, grouping on the field values for similar names聚合以获取数组中键值对的总数,对相似名称的字段值进行分组
【发布时间】:2015-09-15 02:55:17
【问题描述】:

我有如下结构的文档,其中每个数组元素包含“n”和“v”作为不同类型数据的键和值。我需要按“ipaddress”的“n”值对它进行分组,并计算集合中不同的总组合。但是,值相似但不相同。 (例如:ip、ip_addr 和 ipaddr)

> db.final.find().pretty()
{
        "_id" : 2,
        "props" : [
                {
                        "n" : "loc",
                        "v" : "NW"
                },
                {
                        "n" : "logtype",
                        "v" : "1"
                },
                {
                        "n" : "ip",
                        "v" : "1.1.1.1"
                },
                {
                        "n" : "pro",
                        "v" : "tcp"
                },
                {
                        "n" : "port",
                        "v" : "13438"
                }
        ]
}
{
        "_id" : 5,
        "props" : [
                {
                        "n" : "loc",
                        "v" : "NW"
                },
                {
                        "n" : "logtype",
                        "v" : "1"
                },
                {
                        "n" : "ip",
                        "v" : "1.1.1.1"
                },
                {
                        "n" : "pro",
                        "v" : "tcp"
                },
                {
                        "n" : "port",
                        "v" : "53"
                }
        ]
}
{
        "_id" : 1,
        "props" : [
                {
                        "n" : "loc",
                        "v" : "NW"
                },
                {
                        "n" : "logtype",
                        "v" : "2"
                },
                {
                        "n" : "ip_addr",
                        "v" : "2.2.2.2"
                },
                {
                        "n" : "pro",
                        "v" : "udp"
                },
                {
                        "n" : "port",
                        "v" : "53"
                }
        ]
}
{
        "_id" : 3,
        "props" : [
                {
                        "n" : "loc",
                        "v" : "NW"
                },
                {
                        "n" : "logtype",
                        "v" : "3"
                },
                {
                        "n" : "ipaddr",
                        "v" : "1.1.1.1"
                },
                {
                        "n" : "pro",
                        "v" : "tcp"
                },
                {
                        "n" : "port",
                        "v" : "53"
                }
        ]
}
{
        "_id" : 4,
        "props" : [
                {
                        "n" : "loc",
                        "v" : "LA"
                },
                {
                        "n" : "logtype",
                        "v" : "3"
                },
                {
                        "n" : "ipaddr",
                        "v" : "1.1.1.1"
                },
                {
                        "n" : "pro",
                        "v" : "udp"
                },
                {
                        "n" : "port",
                        "v" : "53"
                }
        ]
}
{
        "_id" : 6,
        "props" : [
                {
                        "n" : "loc",
                        "v" : "LA"
                },
                {
                        "n" : "logtype",
                        "v" : "1"
                },
                {
                        "n" : "ip",
                        "v" : "1.1.1.1"
                },
                {
                        "n" : "pro",
                        "v" : "udp"
                },
                {
                        "n" : "port",
                        "v" : "53"
                }
        ]
}

查询选择条件如下:

  1. 如果“loc”为“NW”且“logtype”为“1”,则“ipaddress”=“ip”
  2. 如果“loc”为“NW”且“logtype”为“2”,则“ipaddress”=“ip_addr”
  3. 如果“loc”为“NW”且“logtype”为“3”,则“ipaddress”=“ipaddr”
  4. 端口是“53”
  5. pro 是“udp”或“tcp”
  6. 按“ipaddress”分组

我想要这样的结果。

{"ipaddress" : "2.2.2.2" , count : 1}
{"ipaddress" : "1.1.1.1" , count : 2}

这是我目前所拥有的:

db.final.aggregate([
    { "$match": {
        "$and": [
            {"props" : {"$elemMatch": { "n": "port", "v": "53" }}},
            {"props" : {"$elemMatch": { "n": "pro", "v": {"$in" : [/udp/, /tcp/]} }}}
        ]
    }},
    { "$unwind": "$props" },
        {
        "$project": {
            "_ipaddress": {
                "$cond": {
                    "if": { "$eq": [ "$props.n", "ip" ] },
                    "then": "$props.v",
                    "else": {
                        "$cond": {
                            "if": { "$eq": [ "$props.n", "ip_addr" ] },
                            "then": "$props.v",
                            "else": {
                                "$cond" : {
                                    "if": { "$eq": [ "$props.n", "ipaddr" ] },
                                    "then": "$props.v",
                                    "else" : 0
                                }
                            }
                        }
                    }
                }
            },
            "_id": 1,
            "props" : 1
        }
    },
    { "$group": {
        "_id": "$_id",
        "_ipaddress": {
            "$min": {
                "$cond": [ { "$ne": [ "$_ipaddress", 0 ] }, "$_ipaddress", false ]
            }
        },
        "pro": {
            "$min": {
                "$cond": [ { "$eq": [ "$props.n", "pro" ] }, "$props.v", false ]
            }
        },
        "logtype": {
            "$min": {
                "$cond": [ { "$eq": [ "$props.n", "logtype" ] }, "$props.v", false ]
            }
        },
        "port": {
            "$min": {
                "$cond": [ { "$eq": [ "$props.n", "port" ] }, "$props.v", false ]
            }
        }
    } },
        { "$group": {
        "_id": {
            "_ipaddress": "$_ipaddress",
        },
        "count": { "$sum": 1 }
    }}
])

但我不知道如何结合“loc”和“logtype”条件。

【问题讨论】:

    标签: mongodb mongodb-query aggregation-framework


    【解决方案1】:

    这里每个文档都有一个数组,共同存储相关信息。这种设计不太适合您尝试的查询。 据我了解,$unwind 在这里帮不上什么忙,因为它会拆分数组元素。

    我找到的解决方案是将您的数组元素设置为键:值对并成功扩展。

    > db.final.aggregate([
     ... {"$project" : {"pro" : { "$setDifference" :[{"$map" : {"input" : "$props","as" : "array_elem", "in" : { "$cond" : [ { "$eq" : [ "$$array_elem.n", "pro" ] }, "$$array_elem.v", null]}}}, [null] ] },props:1}},
     ... {"$project" : {"loc" : { "$setDifference" :[{"$map" : {"input" : "$props","as" : "array_elem", "in" : { "$cond" : [ { "$eq" : [ "$$array_elem.n", "loc" ] }, "$$array_elem.v", null]}}}, [null] ] },props:1,pro:1}},
     ... {"$project" : {"port" : { "$setDifference" :[{"$map" : {"input" : "$props","as" : "array_elem", "in" : { "$cond" : [ { "$eq" : [ "$$array_elem.n", "port" ] }, "$$array_elem.v", null]}}}, [null] ] },props:1,pro:1,loc:1}},
     ... {"$project" : {"ip" : { "$setDifference" :[{"$map" : {"input" : "$props","as" : "array_elem", "in" : { "$cond" : [ { "$eq" : [ "$$array_elem.n", "ip" ] }, "$$array_elem.v", null]}}}, [null] ] },props:1,pro:1,loc:1,port:1}},
     ... {"$project" : {"ip_addr" : { "$setDifference" :[{"$map" : {"input" : "$props","as" : "array_elem", "in" : { "$cond" : [ { "$eq" : [ "$$array_elem.n", "ip_addr" ] }, "$$array_elem.v", null]}}}, [null] ] },props:1,pro:1,loc:1,port:1,ip:1}},
     ... {"$project" : {"ipaddr" : { "$setDifference" :[{"$map" : {"input" : "$props","as" : "array_elem", "in" : { "$cond" : [ { "$eq" : [ "$$array_elem.n", "ipaddr" ] }, "$$array_elem.v", null]}}}, [null] ] },pro:1,loc:1,port:1,ip:1,ip_addr:1}}
     ... ])
    
    //Result(I have used data given in your question):
    { "_id" : 2, "pro" : [ "tcp" ], "loc" : [ "NW" ], "port" : [ "13438" ], "ip" : [ "1.1.1.1" ], "ip_addr" : [ ], "ipaddr" : [ ] }
    { "_id" : 5, "pro" : [ "tcp" ], "loc" : [ "NW" ], "port" : [ "53" ], "ip" : [ "1.1.1.1" ], "ip_addr" : [ ], "ipaddr" : [ ] }
    { "_id" : 1, "pro" : [ "udp" ], "loc" : [ "NW" ], "port" : [ "53" ], "ip" : [ ], "ip_addr" : [ "2.2.2.2" ], "ipaddr" : [ ] }
    { "_id" : 3, "pro" : [ "tcp" ], "loc" : [ "NW" ], "port" : [ "53" ], "ip" : [ ], "ip_addr" : [ ], "ipaddr" : [ "1.1.1.1" ] }
    { "_id" : 4, "pro" : [ "udp" ], "loc" : [ "LA" ], "port" : [ "53" ], "ip" : [ ], "ip_addr" : [ ], "ipaddr" : [ "1.1.1.1" ] }
    { "_id" : 6, "pro" : [ "udp" ], "loc" : [ "LA" ], "port" : [ "53" ], "ip" : [ "1.1.1.1" ], "ip_addr" : [ ], "ipaddr" : [ ] }
    

    这里 Array 中的每个元素都是键和值。您可以在此文档上应用条件并获得所需的结果。我没有在这里给出完整的答案,因为您已经部分完成了。

    注意:

    1. MongoDB 不允许在聚合中使用 $elemMatch(https://jira.mongodb.org/browse/SERVER-14876),用于此的解决方法是增加聚合管道的大小。
    2. 如果数组中元素的位置可靠,则可以简单地将数组元素转换为 key:vale 对。

    【讨论】:

      猜你喜欢
      • 2015-10-17
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2023-01-20
      • 2018-03-15
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多