【发布时间】:2021-02-23 20:02:14
【问题描述】:
我们在 ADLS Gen1 上创建了许多 databricks DELTA 表。此外,在其中一个 databricks 工作区中的每个表的顶部都构建了外部表。
同样,我正在尝试在相同的 DELTA 格式文件上创建相同类型的外部表,但在不同的工作区中。
我确实通过 ADLS Gen1 上的服务原则进行了只读访问。所以我可以通过 spark 数据帧读取 DELTA 文件,如下所示:
read_data_df = spark.read.format("delta").load('dbfs:/mnt/data/<foldername>')
我什至可以创建 hive 外部表,但在从同一个表中读取数据时确实看到以下警告:
Error in SQL statement: AnalysisException: Incompatible format detected.
A transaction log for Databricks Delta was found at `dbfs:/mnt/data/<foldername>/_delta_log`,
but you are trying to read from `dbfs:/mnt/data/<foldername>` using format("hive"). You must use
'format("delta")' when reading and writing to a delta table.
To disable this check, SET spark.databricks.delta.formatCheck.enabled=false
To learn more about Delta, see https://docs.microsoft.com/azure/databricks/delta/index
;
如果我“使用 DELTA”创建外部表,则会看到不同的访问错误,如下所示:
Caused by: org.apache.hadoop.security.AccessControlException:
OPEN failed with error 0x83090aa2 (Forbidden. ACL verification failed.
Either the resource does not exist or the user is not authorized to perform the requested operation.).
failed with error 0x83090aa2 (Forbidden. ACL verification failed.
Either the resource does not exist or the user is not authorized to perform the requested operation.).
这是否意味着我需要完全访问权限,而不是只读?,在文件系统下的那些?
谢谢
【问题讨论】:
标签: pyspark azure-databricks delta-lake