【问题标题】:Streaming without truncating流式传输而不截断
【发布时间】:2019-09-10 09:16:23
【问题描述】:

我有以下形式的 json 数据。我想对其进行转换,以流方式将每条记录的键转换为该记录的字段。我的问题:如果不截断密钥并丢失它,我不知道该怎么做。我已经推断出流所需的结构,见底部。

问题:如何在不丢失密钥的情况下将输入数据转换为流?

数据:

{
  "foo" : {
    "a" : 1,
    "b" : 2
  },
  "bar" : {
    "a" : 1,
    "b" : 2
  }
}

非流式转换使用:

jq 'with_entries(.value += {key}) | .[]'

屈服:

{
  "a": 1,
  "b": 2,
  "key": "foo"
}
{
  "a": 1,
  "b": 2,
  "key": "bar"
}

现在,如果我的数据文件非常大,我更喜欢流式传输:

jq -ncr --stream 'fromstream(1|truncate_stream(inputs))`

问题:这会截断键 "foo""bar"。另一方面,不截断流而只调用fromstream(inputs) 是毫无意义的:这使得整个--stream 部分成为空操作,jq 将所有内容读入内存。

流的结构如下,使用. | tostream:

[
  [
    "foo",
    "a"
  ],
  1
]
[
  [
    "foo",
    "b"
  ],
  2
]
[
  [
    "foo",
    "b"
  ]
]
[
  [
    "bar",
    "a"
  ],
  1
]
[
  [
    "bar",
    "b"
  ],
  2
]
[
  [
    "bar",
    "b"
  ]
]
[
  [
    "bar"
  ]
]

使用截断 . as $dot | (1|truncate_stream($dot | tostream)),结构为:

[
  [
    "a"
  ],
  1
]
[
  [
    "b"
  ],
  2
]
[
  [
    "b"
  ]
]
[
  [
    "a"
  ],
  1
]
[
  [
    "b"
  ],
  2
]
[
  [
    "b"
  ]
]

所以看起来为了让我按照我需要的方式构造一个流,我必须生成以下结构(我在第一条记录完成后插入了一个[["foo"]]):

[
  [
    "foo",
    "a"
  ],
  1
]
[
  [
    "foo",
    "b"
  ],
  2
]
[
  [
    "foo",
    "b"
  ]
]
[
  [
    "foo"
  ]
]
[
  [
    "bar",
    "a"
  ],
  1
]
[
  [
    "bar",
    "b"
  ],
  2
]
[
  [
    "bar",
    "b"
  ]
]
[
  [
    "bar"
  ]
]

把它变成一个字符串jq可以消费,我确实得到了我需要的东西(另见这里的sn-p:https://jqplay.org/s/iEkMfm_u92):

fromstream([ [ "foo", "a" ], 1 ],[ [ "foo", "b" ], 2 ],[ [ "foo", "b" ] ],[["foo"]],[ [ "bar", "a" ], 1 ],[ [ "bar", "b" ], 2 ],[ [ "bar", "b" ] ],[ [ "bar" ] ])

屈服:

{
  "foo": {
    "a": 1,
    "b": 2
  }
}
{
  "bar": {
    "a": 1,
    "b": 2
  }
}

最终结果(见https://jqplay.org/s/-UgbEC4BN8)将是:

fromstream([ [ "foo", "a" ], 1 ],[ [ "foo", "b" ], 2 ],[ [ "foo", "b" ] ],[["foo"]],[ [ "bar", "a" ], 1 ],[ [ "bar", "b" ], 2 ],[ [ "bar", "b" ] ],[ [ "bar" ] ]) | with_entries(.value += {key}) | .[]

屈服

{
  "a": 1,
  "b": 2,
  "key": "foo"
}
{
  "a": 1,
  "b": 2,
  "key": "bar"
}

【问题讨论】:

    标签: json object stream jq


    【解决方案1】:

    jq Cookbook 中提供了用于将对象转换为键值对象的通用函数atomize(s)。使用它,这里问题的解决方法很简单:

    atomize(inputs) | to_entries[] | .value + {key}
    

    {key}{key: .key} 的简写。)

    供参考,这里是def

    雾化

    # Convert an object (presented in streaming form as the stream s) into
    # a stream of single-key objects
    # Example:
    #   atomize(inputs) (used in conjunction with "jq -n --stream")
    def atomize(s):
      fromstream(foreach s as $in ( {previous:null, emit: null};
          if ($in | length == 2) and ($in|.[0][0]) != .previous and .previous != null
          then {emit: [[.previous]], previous: ($in|.[0][0])}
          else { previous: ($in|.[0][0]), emit: null}
          end;
          (.emit // empty), $in
          ) ) ;
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2014-08-25
      • 2011-07-01
      • 2013-06-05
      • 1970-01-01
      • 2023-03-27
      • 2011-06-18
      • 1970-01-01
      相关资源
      最近更新 更多