使用 try-except 块检查表 a 是否存在于胶水数据库中答案

【问题标题】：Check if table a exists in glue database with try-except block使用 try-except 块检查表 a 是否存在于胶水数据库中
【发布时间】：2022-09-29 19:01:17
【问题描述】：

从这个问题开始：

Check if table exists in hive metastore using Pyspark

我想通过使用 try-except 块而不是 if else 语句的 AWS Glue pyspark 作业来实现相同的结果。然后，如果表存在，我想执行增量数据摄取，否则我将创建它并执行完整摄取。

脚本可能看起来像下面的 sn-p，但我不确定异常：

source_table = glueContext.create_dynamic_frame.from_catalog(
    database = \"source_db\", table_name = \"source_table\"
)

source_activities.toDF().createOrReplaceTempView(\"source_table\")


try: # perform incremental ingestion if the table exists
   target_table = glueContext.create_dynamic_frame.from_catalog(
        database = \"my_db\", table_name = \"target_table\"
        )
    target_table.toDF().createOrReplaceTempView(\"target_table\")

    query = f\"\"\"
    SELECT id
           , date_event
    FROM source_table
    WHERE date(A.date_event) > (select max(date_event) as max_value from target_table)  
    \"\"\"

except <WHAT EXCEPTION? SOMETHING LIKE tableNotFound>: # perform full ingestion if the table is not found

    query=\"\"\"
    SELECT id
           , date_event
    FROM source_table
    \"\"\"

谢谢！

标签： python pyspark etl aws-glue aws-glue-spark

【解决方案1】：

最好的办法是使用job bookmarks 来运行增量数据摄取。如果您想使用boto3 编写自己的脚本，则建议创建一个 EMR 集群并从那里运行执行增量摄取的 python 脚本。

【讨论】：