This Snowflake 的 Craig Warman 撰写的文章描述了如何从 JSON 文档/文件自动创建视图。这可以调整为允许您读取 json 一次,然后根据该文件中的内容自动构建一个表和一个视图。您将需要一个自定义 JavaScript UDF
创建 UDF 后,您可以简单地调用:
call create_view_over_json(
'@EXT_STAGE/sample.json',
'DB_NAME.MY_SCHEMA.sample',
'DB_NAME.MY_SCHEMA.sample_vw');
这只会产生一个对外部阶段的调用,然后再产生两个调用:一个是查询包含原始 json 的表,另一个是为其创建视图:
之后你就有了包含数据的表,以及一个可以直接查询的视图:
SELECT * FROM DB_NAME.MY_SCHEMA.sample_vw;
还有你需要的 UDF:
create or replace procedure create_view_over_json (STAGE_FILE_NAME varchar, TABLE_NAME varchar, VIEW_NAME varchar)
returns varchar
language javascript
as
$$
// Attribution:
// Craig Warman for the original code who also leveraged code developed by Alan Eldridge.
// This generates paths with levels enclosed by double quotes (ex: "path"."to"."element"). It also strips any bracket-enclosed array element references (like "[0]")
var path_name = "regexp_replace(regexp_replace(f.path,'\\\\[(.+)\\\\]',''), '(\\w+)','\"\\1\"')"
var attribute_type = "DECODE (substr(typeof(f.value),1,1),'A','ARRAY','B','BOOLEAN','I','FLOAT','D','FLOAT','STRING')"; // This generates column datatypes of ARRAY, BOOLEAN, FLOAT, and STRING only
var alias_name = "REGEXP_REPLACE(REGEXP_REPLACE(f.path, '\\[(.+)\\]'),'[^a-zA-Z0-9]','_')" ; // This generates column aliases based on the path
var col_list = "";
var json_col_name = "json_data";
// Create or replace the table based on the file
var table_ddl = "CREATE OR REPLACE TABLE " + TABLE_NAME + " AS \n" +
"SELECT \n" +
" parse_json($1) AS " + json_col_name + " \n" +
"FROM " + STAGE_FILE_NAME + "; \n"
var table_stmt = snowflake.createStatement({sqlText:table_ddl});
var table_res = table_stmt.execute();
// Build a query that returns a list of elements which will be used to build the column list for the CREATE VIEW statement
var element_query = "SELECT DISTINCT \n" +
path_name + " AS path_name, \n" +
attribute_type + " AS attribute_type, \n" +
alias_name + " AS alias_name \n" +
"FROM \n" +
TABLE_NAME + ", \n" +
"LATERAL FLATTEN(" + json_col_name + ", RECURSIVE=>true) f \n" +
"WHERE TYPEOF(f.value) != 'OBJECT' \n" +
"AND NOT contains(f.path,'[') "; // This prevents traversal down into arrays;
// Run the query...
var element_stmt = snowflake.createStatement({sqlText:element_query});
var element_res = element_stmt.execute();
// ...And loop through the list that was returned
while (element_res.next()) {
// Add elements and datatypes to the column list
// They will look something like this when added:
// col_name:"name"."first"::STRING as name_first,
// col_name:"name"."last"::STRING as name_last
if (col_list != "") {
col_list += ", \n";}
col_list += "\t" + json_col_name + ":" + element_res.getColumnValue(1); // Start with the element path name
col_list += "::" + element_res.getColumnValue(2); // Add the datatype
col_list += " as " + element_res.getColumnValue(3); // And finally the element alias
}
// Now build the CREATE VIEW statement
var view_ddl = "CREATE OR REPLACE VIEW " + VIEW_NAME + " AS \n" +
"SELECT \n" + col_list + "\n" +
"FROM " + TABLE_NAME;
var view_stmt = snowflake.createStatement({sqlText:view_ddl});
var view_res = view_stmt.execute();
view_res.next();
return ((view_ddl), (element_query));
$$;
几点:
- 如果 json 很复杂,您需要调整存储过程来处理它。再次,克雷格在part 2 中进行救援
- 如果 json 不时发生显着变化,请考虑确保您的架构不会中断。 Schema Evolution here 的一些简单阅读。
- 这个例子是一个json文件,AVRO,Parquet,CSV,甚至XML都可以采用同样的解决方案,通过改变CREATE TABLE语句的生成方式和查询的构造方式后面部分查询各自的格式