【问题标题】:How to grouped/aggregate data in mongo dB to Make a Events如何在 mongodB 中对数据进行分组/聚合以进行事件
【发布时间】:2021-04-18 10:07:01
【问题描述】:

我有这种格式的 mongo 数据

[
  { 
     _id:ObjectId("5f71890730a4421699b1fbff"),
    timestamp: ISODate("2020-01-12T03:07:52Z"),
    running_fig: "circle",
  },
  {
    _id:ObjectId("5f718ac330a4421699b1fc15"),
    timestamp: ISODate("2020-01-12T03:08:48Z"),
    running_fig: "circle",
  },
  {
    _id:ObjectId("5f718ac330a4421699b1fc16"),
    timestamp: ISODate("2020-01-12T03:09:32Z"),
    running_fig: "rombous",
  },
  {
    _id:ObjectId("5f718ac330a4421699b1fc14"),
    timestamp: ISODate("2020-01-12T03:10:11Z"),
    running_fig: "triangle",
  },
  {
    _id:ObjectId("5f718ac330a4421699b1fc13"),
    timestamp: ISODate("2020-01-12T03:11:52Z"),
    running_fig: "triange",
  },
  {
    _id:ObjectId("5f718ac330a4421699b1fc12"),  
    timestamp: ISODate("2020-01-12T03:15:22Z"),
    running_fig: "circle",
  },
  {
     _id:ObjectId("5f718ac330a4421699b1fc1e"),
    timestamp: ISODate("2020-01-12T03:20:52Z"),
    running_fig: "circle",
  },
  
]

** 现在我想根据运行图时间制作一个事件图表,我希望以我给出的以下表格查询结果**

[
  {
running_fig:“circle”,
from: 2020-12-21T03: 07: 52Z,
to: 2020-12-21T03: 09: 48Z,
duration: 2 min.
  },
  {
running_fig:“rombous”,
from: 2020-12-21T03: 09: 48Z,
to: 2020-12-21T03: 10: 32Z,
duration: 1 min.
  },
  {
running_fig:“triangle”,
from: 2020-12-21T03: 10: 32Z,
to: 2020-12-21T03: 15: 22Z,
duration: 5 min.
  },
  {
running_fig:“circle”,
from: 2020-12-21T03: 15: 22Z,
to: 2020-12-21T03: 25: 52Z (current time),
duration: 10 min.
  }
]

所以我想要这种格式的结果数据,这样我就可以相应地创建一个图表,在这个数据中我的图表 running_fig 圈 start_time 是自己的时间戳,它的 end_time 将是下一个 running_fig 时间戳 在我的情况下,下一个菱形即将到来,因此 running_fig 圆圈的持续时间为 (3:09 -3:07) = 2 分钟,它们组合成单个数据显示我的预期结果。 任何人请帮我实现这个查询,提前谢谢

【问题讨论】:

  • 为什么第一圈的结尾是2020-12-21T03:09:48Z?我看不出时间是从哪里来的。其他时间也一样。
  • @WernfriedDomscheit 感谢您在这种情况下的考虑,数据还在继续。当我的案例圈中的 running_fig 值更改为 Rombous 时,意味着在此持续时间圈之间的圈子的 start_time(3:07)和 Rombous 的 start_time(3:09)进入 running_fig,因此其持续时间为 3 分钟。同样适用于所有。
  • 不清楚你的意思。请根据所需的输出提供准确的输入数据。
  • @WernfriedDomscheit 感谢您的回复。实际上,我的传感器在数据更改时将数据记录到 MongoDB,当任何参数更改时,我有大约 10 个参数要记录,以将所有数据记录到我的数据库中,这就是为什么我得到相同的 2020-12-21T03:09:48Z
  • 我还是不明白你的逻辑。您很可能必须使用$reduce。当然它会包含表达式{$last: "$$value."$timestamp"}

标签: node.js mongodb mongoose mongodb-query aggregation-framework


【解决方案1】:

这是一个有点乏味的解决方案。

说明

  1. 将数据放入数组中(假设您从传感器 1 获取数据)
  2. 按时间戳排序
  3. 检查数组中的前索引,如果前索引 running_fig = 当前索引 running_fig 从数组中删除此数据。
  4. 添加下一个索引数据。
  5. 如果下一个索引时间戳为空,则添加当前时间戳

代码

db.collection.aggregate([
  /** group by machine name*/
  {
    "$group": {
      "_id": "$sensor",
      docs: {
        $push: "$$ROOT"
      }
    },
    
  },
  /** sort by date to make event list*/
  {
    $sort: {
      "docs.timestamp": -1
    }
  },
  /** get pre data*/
  {
    $project: {
      docs: {
        /** transform the "docs" field*/
        $map: {
          /** into something*/
          input: {
            $range: [
              0,
              {
                $size: "$docs"
              }
            ]
          },
          /** an array from 0 to n - 1 where n is the number of documents*/
          as: "this",
          /** which shall be accessible using "$$this"*/
          in: {
            $mergeObjects: [
              /** we join two documents*/
              {
                $arrayElemAt: [
                  "$docs",
                  "$$this"
                ]
              },
              /** one is the nth document in our "docs" array*/
              {
                "pre_index": {
                  $cond: [
                    {
                      "$gte": [
                        {
                          "$subtract": [
                            "$$this",
                            1
                          ]
                        },
                        0
                      ]
                    },
                    {
                      "$arrayElemAt": [
                        "$docs",
                        {
                          "$subtract": [
                            "$$this",
                            1
                          ]
                        },
                        
                      ]
                    },
                    null
                  ]
                },
                index: "$$this"
              }/** and the second document is the one with our "index" field*/
              
            ]
          }
        }
      }
    }
  },
  /**remove same state data*/
  {
    $project: {
      _id: "$_id",
      noDuplicateArray: {
        $filter: {
          input: "$docs",
          as: "a",
          cond: {
            $ne: [
              "$$a.running_fig",
              "$$a.pre_index.running_fig"
            ]
          }
        }
      }
    }
  },
  /**add next data*/
  {
    $project: {
      docs: {
        /** transform the "docs" field*/
        $map: {
          /** into something*/
          input: {
            $range: [
              0,
              {
                $size: "$noDuplicateArray"
              }
            ]
          },
          /** an array from 0 to n - 1 where n is the number of documents*/
          as: "this",
          /** which shall be accessible using "$$this"*/
          in: {
            $mergeObjects: [
              /** we join two documents*/
              {
                $arrayElemAt: [
                  "$noDuplicateArray",
                  "$$this"
                ]
              },
              /** one is the nth document in our "docs" array*/
              {
                "to": {
                  "$arrayElemAt": [
                    "$noDuplicateArray",
                    {
                      $add: [
                        "$$this",
                        1
                      ]
                    },
                    
                  ]
                },
                index: "$$this"
              }/** and the second document is the one with our "index" field*/
              
            ]
          }
        }
      }
    }
  },
  {
    "$unwind": "$docs"
  },
  {
    "$project": {
      _id: "$docs._id",
      sensor: "$_id",
      from: "$docs.timestamp",
      to: {
        "$ifNull": [
          "$docs.to.timestamp",
          "$$NOW"
        ]
      },
      running_fig: "$docs.running_fig",
      duration: {
        $concat: [
          {
            $toString: {
              $round: [
                {
                  $divide: [
                    {
                      $subtract: [
                        {
                          "$ifNull": [
                            "$docs.to.timestamp",
                            "$$NOW"
                          ]
                        },
                        "$docs.timestamp"
                      ]
                    },
                    60000
                  ]
                },
                1
              ]
            }
          },
          " min"
        ]
      }
    }
  }
])

Mongo 游乐场:https://mongoplayground.net/p/CxHNdO6vLop

【讨论】:

  • 我认为 $sort 必须在 $group 之前 - 而且无论如何都没有 date 字段。
  • @WernfriedDomscheit 百万数据的性能参数是什么
【解决方案2】:

如前所述,您的示例数据与预期结果不符,因此很难理解逻辑。但是这个聚合应该显示它可能的方向。

使用$reduce的版本:

db.collection.aggregate([
   { $sort: { timestamp: -1 } },
   // Transform documents to array
   { $group: { _id: null, data: { $push: "$$ROOT" } } },
   // combine timestamp with previous timestamp
   {
      $set: {
         data: {
            $reduce: {
               input: "$data",
               initialValue: [],
               in: {
                  $concatArrays: ["$$value",
                     [{
                        running_fig: "$$this.running_fig",
                        from: "$$this.timestamp",
                        to: { $ifNull: [{ $last: "$$value.from" }, "$$NOW"] }
                     }]
                  ]
               }
            }
         }
      }
   },
   { $unwind: "$data" },
   { $sort: { "data.from": 1 } },
   { $group: { _id: null, data: { $push: "$$ROOT.data" } } },
   // find consequtive running_fig
   {
      $set: {
         data: {
            $reduce: {
               input: "$data",
               initialValue: [],
               in: {
                  $concatArrays: ["$$value",
                     [{
                        $cond: {
                           if: { $ne: [{ $last: "$$value.running_fig" }, "$$this.running_fig"] },
                           then: "$$this",
                           else: null
                        }
                     }]
                  ]
               }
            }
         }
      }
   },
   // remove null values from array
   { $set: { data: { $filter: { input: "$data", cond: { $ne: ["$$this", null] } } } } },
   { $unwind: "$data" }
   { $replaceRoot: { newRoot: "$data" } }
])

使用$map$range 的版本:

db.collection.aggregate([
   { $sort: { timestamp: 1 } },
   { $group: { _id: null, data: { $push: "$$ROOT" } } },
   {
      $set: {
         data: {
            $map: {
               input: { $range: [0, { $size: "$data" }] },
               as: "idx",
               in:
                  {
                     $cond: {
                        if: {
                           $ne: [
                              { $arrayElemAt: ["$data.running_fig", "$$idx"] },
                              { $arrayElemAt: ["$data.running_fig", { $add: ["$$idx", 1] }] }
                           ]
                        },
                        then: {
                           running_fig: { $arrayElemAt: ["$data.running_fig", "$$idx"] },
                           from: { $arrayElemAt: ["$data.timestamp", "$$idx"] },
                           to: { $ifNull: [{ $arrayElemAt: ["$data.timestamp", { $add: ["$$idx", 1] }] }, "$$NOW"] }
                        },
                        else: null
                     }
                  }
            }
         }
      }
   },
   { $set: { data: { $filter: { input: "$data", cond: { $ne: ["$$this", null] } } } } },
   { $unwind: "$data" },
   { $replaceRoot: { newRoot: "$data" } }
]);

【讨论】:

  • 非常非常感谢,我也期待。还有一个问题我要问我是否有数十亿的数据,那么哪种方法更有效。 @WernfriedDomscheit
  • 使用$map 显然更快,见jira.mongodb.org/browse/SERVER-53503
猜你喜欢
  • 2023-01-20
  • 2020-11-16
  • 1970-01-01
  • 1970-01-01
  • 2019-05-19
  • 2019-01-31
  • 2020-06-29
  • 1970-01-01
  • 2023-04-02
相关资源
最近更新 更多