【发布时间】:2020-01-15 22:12:34
【问题描述】:
我在一个 BigQuery 项目下有大约 50 个表,这些表由不同平台的 ETL 工具创建。表中没有主键,很少有表的列超过 100。 每个平台都有不同的字段和不同的数据类型,团队的要求是创建一个所有表合并的一个大主表
我通过在 Excel 中列出表的所有列并将表中不存在的字段清空以形成对每个表的查询和 UNION ALL 一起创建主表来手动完成此任务。 然后我在 BigQuery 中安排它每天刷新。
例如:
Tbl A Tbl B Tbl C
char1 char1 char1
num1 num3 num1
char2 char2 char2
char5 num2
num2 num3
char3
char4
手动查询:
With mast_tbl as (
select concat(cast(row_number() over (partition by date) as string), ' | ', ' TblA) as pk_client ,
char1,
num1,
char2,
null as num3,
cast(null as string) as char5,
null as num2,
cast(null as string) as char3,
cast(null as string) as char4
FROM `bigquery-project-XXXX.export_TblA.all _data_view`
UNION ALL
select concat(cast(row_number() over (partition by date) as string),' | ', ' TblB) as pk_client ,
char1,
null as num1,
char2,
num3,
char5,
num2,
cast(null as string) as char3,
cast(null as string) as char4
FROM `bigquery-project-XXXX.export_TblB.all _data_view`
select concat(cast(row_number() over (partition by date) as string), ' | ', ' TblC) as pk_client ,
char1,
num1,
char2,
num3,
cast(null as string) as char5,
num2,
char3,
char4
FROM `bigquery-project-XXXX.export_TblC.all _data_view`
)
Select * from mast_tbl
这可行,但是手动工作量很大,特别是如果我们要添加新表或新列或删除列,这将使我在每个表查询时都进行更改以使 UNION ALL 正常工作。 所以,我想知道是否有任何方法可以自动化脚本或我们应该执行此任务的任何其他方式。
【问题讨论】:
-
“团队的要求是把所有的表合并成一个大的主表”我要问为什么?
标签: sql google-bigquery union union-all