【问题标题】:Access Data from Azure Data Lake Store using Polybase with Azure Data Warehouse使用 Polybase 和 Azure 数据仓库从 Azure Data Lake Store 访问数据
【发布时间】:2019-10-28 16:12:20
【问题描述】:

创建外部表时出错

https://exoticbaryon.anset.org/2017/06/26/access-data-from-azure-data-lake-store-using-polybase-with-azure-data-warehouse/#comment-157

CREATE MASTER KEY ENCRYPTION BY PASSWORD  = 'xxxxx' 

CREATE DATABASE SCOPED CREDENTIAL ADLUser 
WITH IDENTITY =         xxxxx@/https://login.microsoftonline.com/xxxxx/oauth2/v2.0/token',
SECRET = xxxxx' ;

CREATE EXTERNAL DATA SOURCE AzureDataLakeStore
WITH (TYPE = HADOOP,
  CREDENTIAL = ADLUser,
  LOCATION = N'adl://xxxxx.azuredatalakestore.net'
)


CREATE EXTERNAL FILE FORMAT TextFileFormat 
WITH ( 
   FORMAT_TYPE = DELIMITEDTEXT, 
   FORMAT_OPTIONS (FIELD_TERMINATOR =',',
                   STRING_DELIMITER = '"', 
                   USE_TYPE_DEFAULT = TRUE)
);


CREATE EXTERNAL TABLE [dbo].[xxxxx_external](
[EventMonth] [nvarchar](10) NULL,
[UserCount] [bigint] NULL,
[UserType] [nchar](8) NULL,
[StageType] [bigint] NULL,
[StageName] [nvarchar](9) NULL) 
WITH
(
LOCATION=N'/test/xxxxx.csv', 
DATA_SOURCE = AzureDataLakeStore , 
FILE_FORMAT = TextFileFormat 
) ;

CREATE TABLE [dbo].[xxxxx] 
WITH (DISTRIBUTION = HASH([EventMonth] ) ) 
AS SELECT * FROM 
[dbo].[xxxxx_external] ; 

当运行 CREATE EXTERNAL TABLE 无法执行查询。错误:由于内部错误,外部文件访问失败:'访问 HDFS 时发生错误:调用 HdfsBridge_IsDirExist 时引发 Java 异常。 Java 异常消息:

HdfsBridge::isDirExist - 检查目录是否存在时遇到意外错误:MalformedURLException: no protocol: /https://login.microsoftonline.com/72f988bf-86f1-41af-91ab-2d7cd011db47/oauth2/v2.0/token'

【问题讨论】:

  • 能否确认您使用的是 Azure Data Lake STORE,而不是 Azure Data Lake STORAGE?命名相似,可能会造成混淆。

标签: azure-sqldw external-tables polybase


【解决方案1】:

您必须将外部数据源修改为类似的格式

CREATE EXTERNAL DATA SOURCE <data_source_name>
WITH
(    LOCATION                  = '<prefix>://<path>[:<port>]'
[,   CONNECTION_OPTIONS        = '<name_value_pairs>']
[,   CREDENTIAL                = <credential_name> ]
[,   PUSHDOWN                  = ON | OFF]
[,   TYPE                      = HADOOP | BLOB_STORAGE ]
[,   RESOURCE_MANAGER_LOCATION = '<resource_manager>[:<port>]'
)
[;]

您可以在以下链接中找到更多信息:https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-ver15

当您访问 Azure Data Lake 时,您需要使用“wasbs”提及您的前缀 第一次尝试在文件夹容器中上传单个文件,不要提及任何 .csv 文件名并加载到外部表中。 稍后您可以提及您的特定文件名并测试您的代码。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2020-08-21
    • 2018-08-01
    • 2016-11-28
    • 1970-01-01
    • 2019-10-19
    • 1970-01-01
    • 2023-03-14
    • 1970-01-01
    相关资源
    最近更新 更多