【发布时间】:2021-07-17 08:46:43
【问题描述】:
我的表格中有一个名为“数据”的列,其中包含 JSON,如下所示:
{"tt":"452.95","records":[{"r":"IN184366","t":"812812819910","s":"129.37","d":"982.7","c":"83"},{"r":"IN183714","t":"8028028029093","s":"33.9","d":"892","c":"38"}]}
我已经编写了一个代码来将其取消嵌套到单独的列中,例如 tr,r,s。 下面是代码
with raw as (
SELECT json_extract_path_text(B.Data, 'records', true) as items
FROM tableB as B where B.date::timestamp between
to_timestamp('2019-01-01 00:00:00','YYYY-MM-DD HH24:MA:SS') AND
to_timestamp('2022-12-31 23:59:59','YYYY-MM-DD HH24:MA:SS')
UNION ALL
SELECT json_extract_path_text(C.Data, 'records', true) as items
FROM tableC as C where C.date-5 between
to_timestamp('2019-01-01 00:00:00','YYYY-MM-DD HH24:MA:SS') AND
to_timestamp('2022-12-31 23:59:59','YYYY-MM-DD HH24:MA:SS')
),
numbers as (
SELECT ROW_NUMBER() OVER (ORDER BY TRUE)::integer- 1 as ordinal
FROM <any_random_table> limit 1000
),
joined as (
select raw.*,
json_array_length(orders.items, true) as number_of_items,
json_extract_array_element_text(
raw.items,
numbers.ordinal::int,
true
) as item
from raw
cross join numbers
where numbers.ordinal <
json_array_length(raw.items, true)
),
parsed as (
SELECT J.*,
json_extract_path_text(J.item, 'tr',true) as tr,
json_extract_path_text(J.item, 'r',true) as r,
json_extract_path_text(J.item, 's',true)::float8 as s
from joined J
)
select * from parsed
上面的代码在有少量记录时工作,但这需要一天多的时间才能运行,并且 CPU 利用率(红移)达到 100%,如果我输入日期,甚至使用的磁盘空间也达到 100%在过去两年等之间。或者如果记录数量很大。
任何人都可以建议任何替代方法来取消嵌套 JSON 对象,例如 redshift 中的上述对象。
我的查询计划是:
查询计划中的嵌套循环联接 - 查看联接谓词以避免笛卡尔积
目标:在不使用任何交叉连接的情况下取消嵌套 输入:具有 JSON 的数据列
"tt":"452.95","records":[{"r":"IN184366","t":"812812819910","s":"129.37","d":"982.7","c":"83"},{"r":"IN183714","t":"8028028029093","s":"33.9","d":"892","c":"38"}]}
输出应该是例如 上述 json 中的 tr,r,s 列
【问题讨论】:
-
我不清楚你想做什么。给出了一个示例输入,尽管它不容易阅读 - 一行 - 并且未定义输出(“将其取消嵌套到单独的列中,如 tr,r,s”)。我必须破译代码才能弄清楚你的意图,这是一项昂贵且耗时的任务。
-
@MaxGanzII 编辑了问题
-
我们能否在不使用任何交叉连接或笛卡尔积和查询的情况下在 redshift 中取消嵌套 JSON 以降低时间和空间复杂度?
标签: sql json amazon-redshift query-optimization unnest