【发布时间】:2022-09-29 19:01:17
【问题描述】:
从这个问题开始:
Check if table exists in hive metastore using Pyspark
我想通过使用 try-except 块而不是 if else 语句的 AWS Glue pyspark 作业来实现相同的结果。
然后,如果表存在,我想执行增量数据摄取,否则我将创建它并执行完整摄取。
脚本可能看起来像下面的 sn-p,但我不确定异常:
source_table = glueContext.create_dynamic_frame.from_catalog(
database = \"source_db\", table_name = \"source_table\"
)
source_activities.toDF().createOrReplaceTempView(\"source_table\")
try: # perform incremental ingestion if the table exists
target_table = glueContext.create_dynamic_frame.from_catalog(
database = \"my_db\", table_name = \"target_table\"
)
target_table.toDF().createOrReplaceTempView(\"target_table\")
query = f\"\"\"
SELECT id
, date_event
FROM source_table
WHERE date(A.date_event) > (select max(date_event) as max_value from target_table)
\"\"\"
except <WHAT EXCEPTION? SOMETHING LIKE tableNotFound>: # perform full ingestion if the table is not found
query=\"\"\"
SELECT id
, date_event
FROM source_table
\"\"\"
谢谢!
标签: python pyspark etl aws-glue aws-glue-spark