【问题标题】:jq - group json objects by field value and output grouped values in one linejq - 按字段值对 json 对象进行分组,并在一行中输出分组值
【发布时间】:2021-09-18 14:37:44
【问题描述】:

我有一个 json 格式,其中包含来自 AWS Cloudwatch 的指标、时间戳和值。

{
    "Messages": [],
    "MetricDataResults": [
        {
            "Timestamps": [
                "2021-07-07T13:26:00Z"
            ],
            "StatusCode": "Complete",
            "Values": [
                0.0
            ],
            "Id": "m19",
            "Label": "CPUSurplusCreditsCharged"
        },
        {
            "Timestamps": [
                "2021-07-07T13:28:00Z",
                "2021-07-07T13:27:00Z",
                "2021-07-07T13:26:00Z",
                "2021-07-07T13:25:00Z",
                "2021-07-07T13:24:00Z",
                "2021-07-07T13:23:00Z"
            ],
            "StatusCode": "Complete",
            "Values": [
                12.750425014167137,
                13.033116114731422,
                12.70812153130781,
                12.975,
                15.441924032067199,
                12.916451392476791
            ],
            "Id": "m20",
            "Label": "CPUUtilization"
        },
        {
            "Timestamps": [
                "2021-07-07T13:29:00Z",
                "2021-07-07T13:28:00Z",
                "2021-07-07T13:27:00Z",
                "2021-07-07T13:26:00Z",
                "2021-07-07T13:25:00Z",
                "2021-07-07T13:24:00Z",
                "2021-07-07T13:23:00Z"
            ],
            "StatusCode": "Complete",
            "Values": [
                0.7,
                0.6999533364442371,
                0.6998833527745376,
                0.6999416715273727,
                0.7,
                0.7001166861143524,
                0.6998950157476379
            ],
            "Id": "m21",
            "Label": "NetworkReceiveThroughput"
        }
    ]
}

我使用jq命令将这些值放在一个数组变量中。
并将结果输出到数组变量中,如下所示。

jq -r '.MetricDataResults[] | "\(.Label) \(.Timestamps) \(.Values)"' test.json | while read Label timestamp value
do

  Label=`echo $Label | sed 's/\"//g; s/\[//g; s/\]//g; s/,/ /g'`
  timestamp=`echo $timestamp | sed 's/\"//g; s/\[//g; s/\]//g; s/,/ /g'`
  value=`echo $value | sed 's/\"//g; s/\[//g; s/\]//g; s/,/ /g'`

  arr_timestamp=($timestamp)
  arr_value=($value)

  echo $Label
  echo ${arr_timestamp[@]}
  echo ${arr_value[@]}
done



Evictions
2021-07-07T10:51:00Z 2021-07-07T10:50:00Z 2021-07-07T10:49:00Z 2021-07-07T10:48:00Z 2021-07-07T10:47:00Z 2021-07-07T10:46:00Z 2021-07-07T10:45:00Z
0 0 0 0 0 0 0

CPUUtilization
2021-07-07T10:50:00Z 2021-07-07T10:49:00Z 2021-07-07T10:48:00Z 2021-07-07T10:47:00Z 2021-07-07T10:46:00Z 2021-07-07T10:45:00Z
1.5333333333333332 1.4666666666666666 1.5833333333333333 1.5333333333333332 1.4916666666666665 1.4916666666666665

IsMaster
2021-07-07T10:51:00Z 2021-07-07T10:50:00Z 2021-07-07T10:49:00Z 2021-07-07T10:48:00Z 2021-07-07T10:47:00Z 2021-07-07T10:46:00Z 2021-07-07T10:45:00Z
1 1 1 1 1 1 1

当每个数组变量的时间戳长度不同时,
我只想将同一时间戳中的值显示为单个字符串。

例如

"2021-07-07T10:51:00Z Evictions = 0\nIsMaster = 1"
"2021-07-07T10:50:00Z Evictions = 0\nCPUUtilization = 1.5333333333333332\n IsMaster = 1"
...

我脑子不好,想不出好办法。
有什么好的方法请告诉我。
我没有太多时间,所以请在stackoverflow上提供帮助。

  • 添加
    我的意思是按时间戳分组。像这样
{
    "MetricDataResults": [
        {
            "Timestamps": "2021-07-07T13:28:00Z",
            "Label" : [
               "CPUUtilization",
               "NetworkReceiveThroughput"
            ],
            "Values" : [
               12.750425014167137,
               0.7
            ]
         },
         {
            "Timestamps": "2021-07-07T13:27:00Z",
            "Label" : [
               "CPUUtilization",
               "NetworkReceiveThroughput"
            ],
            "Values" : [
               13.033116114731422,
               0.6999533364442371
            ]
         },
         {
            "Timestamps": "2021-07-07T13:26:00Z",
            "Label" : [
               "CPUUtilization",
               "NetworkReceiveThroughput",
               "CPUSurplusCreditsCharged"
            ],
            "Values" : [
               12.70812153130781,
               0.6998833527745376,
               0.0
            ]
        }
    ]
}

【问题讨论】:

  • 使用一些通用编程语言?
  • 我只使用 shell 脚本。
  • 你能推荐其他有用的编程吗?
  • 我个人更喜欢python或者perl(据说perl是非常强大的文本提取工具)
  • 使用 jq 可以轻松完成该任务,而无需使用 shell 变量、shell 数组或 sed,但我不清楚您的具体要求。如果您显示完整的预期输出,那将有所帮助。

标签: json linux group-by jq amazon-cloudwatch


【解决方案1】:

您只需使用jq 即可实现您的目标。不需要通过 shell 脚本进行进一步处理。 以下 shell 脚本为您提供了两种选择:

  • 输出为文本
  • 以 json 格式输出
#!/bin/bash

INPUT='
{
    "Messages": [],
    "MetricDataResults": [
        {
            "Timestamps": [
                "2021-07-07T13:26:00Z"
            ],
            "StatusCode": "Complete",
            "Values": [
                0.0
            ],
            "Id": "m19",
            "Label": "CPUSurplusCreditsCharged"
        },
        {
            "Timestamps": [
                "2021-07-07T13:28:00Z",
                "2021-07-07T13:27:00Z",
                "2021-07-07T13:26:00Z",
                "2021-07-07T13:25:00Z",
                "2021-07-07T13:24:00Z",
                "2021-07-07T13:23:00Z"
            ],
            "StatusCode": "Complete",
            "Values": [
                12.750425014167137,
                13.033116114731422,
                12.70812153130781,
                12.975,
                15.441924032067199,
                12.916451392476791
            ],
            "Id": "m20",
            "Label": "CPUUtilization"
        },
        {
            "Timestamps": [
                "2021-07-07T13:29:00Z",
                "2021-07-07T13:28:00Z",
                "2021-07-07T13:27:00Z",
                "2021-07-07T13:26:00Z",
                "2021-07-07T13:25:00Z",
                "2021-07-07T13:24:00Z",
                "2021-07-07T13:23:00Z"
            ],
            "StatusCode": "Complete",
            "Values": [
                0.7,
                0.6999533364442371,
                0.6998833527745376,
                0.6999416715273727,
                0.7,
                0.7001166861143524,
                0.6998950157476379
            ],
            "Id": "m21",
            "Label": "NetworkReceiveThroughput"
        }
    ]
}
'

# output as plain text
jq -r '
  .MetricDataResults
  | map(.Values as $values | .Timestamps as $timestamps
        | {Label} +
          foreach range(.Timestamps | length) as $idx
                  (null; {"Timestamp": $timestamps[$idx], "Value": $values[$idx]}; .))
  | group_by(.Timestamp)[]
  | [.[0].Timestamp]
    + map("\(.Label)=\(.Value)")
    | join("\n") + "\n"
' <<< "$INPUT"

# output as json
jq -r '
  .MetricDataResults
  |= (map(.Values as $values | .Timestamps as $timestamps
          | {Id, Label, StatusCode} +
            foreach range(.Timestamps | length) as $idx
                    (null; {"Timestamp": $timestamps[$idx], "Value": $values[$idx]}; .))
     | group_by(.Timestamp)
     | map({Timestamp: .[0].Timestamp,
            Events: del(.[].Timestamp)}))
' <<< "$INPUT"

shell 脚本的第一个jq 命令产生:

2021-07-07T13:23:00Z
CPUUtilization=12.916451392476791
NetworkReceiveThroughput=0.6998950157476379

2021-07-07T13:24:00Z
CPUUtilization=15.441924032067199
NetworkReceiveThroughput=0.7001166861143524

2021-07-07T13:25:00Z
CPUUtilization=12.975
NetworkReceiveThroughput=0.7

2021-07-07T13:26:00Z
CPUSurplusCreditsCharged=0
CPUUtilization=12.70812153130781
NetworkReceiveThroughput=0.6999416715273727

2021-07-07T13:27:00Z
CPUUtilization=13.033116114731422
NetworkReceiveThroughput=0.6998833527745376

2021-07-07T13:28:00Z
CPUUtilization=12.750425014167137
NetworkReceiveThroughput=0.6999533364442371

2021-07-07T13:29:00Z
NetworkReceiveThroughput=0.7

shell 脚本的第二个jq 命令产生:

{
  "Messages": [],
  "MetricDataResults": [
    {
      "Timestamp": "2021-07-07T13:23:00Z",
      "Events": [
        {
          "Id": "m20",
          "Label": "CPUUtilization",
          "StatusCode": "Complete",
          "Value": 12.916451392476791
        },
        {
          "Id": "m21",
          "Label": "NetworkReceiveThroughput",
          "StatusCode": "Complete",
          "Value": 0.6998950157476379
        }
      ]
    },
    {
      "Timestamp": "2021-07-07T13:24:00Z",
      "Events": [
        {
          "Id": "m20",
          "Label": "CPUUtilization",
          "StatusCode": "Complete",
          "Value": 15.441924032067199
        },
        {
          "Id": "m21",
          "Label": "NetworkReceiveThroughput",
          "StatusCode": "Complete",
          "Value": 0.7001166861143524
        }
      ]
    },
    {
      "Timestamp": "2021-07-07T13:25:00Z",
      "Events": [
        {
          "Id": "m20",
          "Label": "CPUUtilization",
          "StatusCode": "Complete",
          "Value": 12.975
        },
        {
          "Id": "m21",
          "Label": "NetworkReceiveThroughput",
          "StatusCode": "Complete",
          "Value": 0.7
        }
      ]
    },
    {
      "Timestamp": "2021-07-07T13:26:00Z",
      "Events": [
        {
          "Id": "m19",
          "Label": "CPUSurplusCreditsCharged",
          "StatusCode": "Complete",
          "Value": 0
        },
        {
          "Id": "m20",
          "Label": "CPUUtilization",
          "StatusCode": "Complete",
          "Value": 12.70812153130781
        },
        {
          "Id": "m21",
          "Label": "NetworkReceiveThroughput",
          "StatusCode": "Complete",
          "Value": 0.6999416715273727
        }
      ]
    },
    {
      "Timestamp": "2021-07-07T13:27:00Z",
      "Events": [
        {
          "Id": "m20",
          "Label": "CPUUtilization",
          "StatusCode": "Complete",
          "Value": 13.033116114731422
        },
        {
          "Id": "m21",
          "Label": "NetworkReceiveThroughput",
          "StatusCode": "Complete",
          "Value": 0.6998833527745376
        }
      ]
    },
    {
      "Timestamp": "2021-07-07T13:28:00Z",
      "Events": [
        {
          "Id": "m20",
          "Label": "CPUUtilization",
          "StatusCode": "Complete",
          "Value": 12.750425014167137
        },
        {
          "Id": "m21",
          "Label": "NetworkReceiveThroughput",
          "StatusCode": "Complete",
          "Value": 0.6999533364442371
        }
      ]
    },
    {
      "Timestamp": "2021-07-07T13:29:00Z",
      "Events": [
        {
          "Id": "m21",
          "Label": "NetworkReceiveThroughput",
          "StatusCode": "Complete",
          "Value": 0.7
        }
      ]
    }
  ]
}

【讨论】:

  • 谢谢.. 非常感谢。
    我运行了你的第一个 jq 命令。结果是一个错误。请问你是从shell脚本文件中执行的吗?
    错误信息error: syntax error, unexpected IDENT
  • 我在 shell 文件中执行 jq -r ' .MetricDataResults |= (map(.Values as $values | .Timestamps as $timestamps | {Id, Label, StatusCode} + foreach range(.Timestamps | length) as $idx (null; {"Timestamp": $timestamps[$idx], "Value": $values[$idx]}; .)) | group_by(.Timestamp) | map({Timestamp: .[0].Timestamp, Events:. | del(.[] | .Timestamp)})) ' sh-test-001.json
  • 还有这个..jq -r '.MetricDataResults | map(.Values as '$values' | .Timestamps as '$timestamps' | {Id, Label, StatusCode} + foreach range(.Timestamps | length) as '$idx' (null; {"Timestamp": '$timestamps[$idx]', "Value": '$values[$idx]'}; .)) | group_by(.Timestamp)[] | [.[0].Timestamp] + map("\(.Label)=\(.Value)") | join("\n") + "\n"' sh-test-001.json
  • 哦。对不起先生...我的jq版本是1.4,我将它升级到1.6并成功。非常感谢。
  • 我在答案中发布了完整的shell脚本
【解决方案2】:

对于文本输出的情况,这是一个简单、惯用的解决方案;从 1.3 开始,它可以与任何版本的 jq 一起使用。特别注意它不依赖foreach,这里的使用过于复杂:

< input.json jq -r '
  .MetricDataResults
  | map(.Values as $values
        | .Timestamps as $timestamps
        | {Label} +
           (range(0; .Timestamps|length) as $idx
            | {Timestamp: $timestamps[$idx], 
               Value:     $values[$idx]} ))
  | group_by(.Timestamp)[]
  | .[0].Timestamp, (.[]|"\(.Label)=\(.Value)"), ""
'


【讨论】:

  • 我想过一个foreach-free 的解决方案,但没有想出。感谢您展示如何做到这一点!
  • @jpseng - 感谢您对 Q 的理解 :-)
猜你喜欢
  • 2018-07-19
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2018-07-20
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多