【问题标题】:DataStax Enterprise 3.2 - Hive S3 NoSuchBucketDataStax Enterprise 3.2 - Hive S3 NoSuchBucket
【发布时间】:2023-03-20 21:13:01
【问题描述】:

我正在运行启用了分析的 DSE 3.2.4。我正在尝试将我的一张表卸载到 S3 中以进行长期存储。我在 hive 中创建了下表:

CREATE EXTERNAL TABLE events_archive (
    event_id string,
    time string,
    type string,
    source string,
    value string
)
PARTITIONED BY (year string, month string, day string, hour string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 's3n://com.mydomain.events/';

然后我尝试使用此查询将一些示例数据加载到其中:

CREATE TEMPORARY FUNCTION c_to_string AS 'org.apache.hadoop.hive.cassandra.ql.udf.UDFCassandraBinaryToString';
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.dynamic.partition=true;


INSERT OVERWRITE TABLE events_archive
PARTITION (year, month, day, hour)
SELECT c_to_string(column4, 'uuid') AS event_id,
       from_unixtime(CAST(column3/1000 AS int)) AS time,
       CASE column1
         WHEN 'pageviews-push' THEN 'page_view'
         WHEN 'score_realtime-internal' THEN 'realtime_score'
         ELSE 'social_data'
       END AS type,
       CASE column1
         WHEN 'pageviews-push' THEN 'internal'
         WHEN 'score_realtime-internal' THEN 'internal'
         ELSE split(column1, '-')[0]
       END AS source,
       value,
       year(from_unixtime(CAST(column3/1000 AS int))) AS year,
       month(from_unixtime(CAST(column3/1000 AS int))) AS month,
       day(from_unixtime(CAST(column3/1000 AS int))) AS day,
       hour(from_unixtime(CAST(column3/1000 AS int))) AS hour,
       c_to_string(key2, 'blob') AS content_id
  FROM events
 WHERE column2 = 'data'
   AND value IS NOT NULL
   AND value != ''
LIMIT 10;

我最终得到了这个异常:

2014-02-11 20:23:55,810 ERROR ql.Driver (SessionState.java:printError(400)) - FAILED: Hive Internal Error: org.apache.hadoop.fs.s3.    S3Exception(org.jets3t.service.S3ServiceException: S3 GET failed for '/10.226.118.113/%2F' XML Error Message: <?xml version="1.0"     encoding="UTF-8"?><Error><Code>NoSuchBucket</Code><Message>The specified bucket does not exist</Message><BucketName>10.226.118.113<    /BucketName><RequestId>FFFFBCE9711A91AE</RequestId><HostId>kXu2oMblsYKD+Jx9O5fTbjosOtTNNtyM+lbE2pmCC63Wm3abJxMvanHdSCYnUyaC</HostId></Error>    )
org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 GET failed for '/10.226.118.113/%2F' XML Error Message: <    ?xml version="1.0" encoding="UTF-8"?><Error><Code>NoSuchBucket</Code><Message>The specified bucket does not exist</Message><BucketName>10.    226.118.113</BucketName><RequestId>FFFFBCE9711A91AE</RequestId><HostId>kXu2oMblsYKD+Jx9O5fTbjosOtTNNtyM+lbE2pmCC63Wm3abJxMvanHdSCYnUyaC<    /HostId></Error>
    at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:156)
    at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.retrieveINode(Jets3tFileSystemStore.java:195)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy14.retrieveINode(Unknown Source)
    at org.apache.hadoop.fs.s3.S3FileSystem.mkdir(S3FileSystem.java:148)
    at org.apache.hadoop.fs.s3.S3FileSystem.mkdirs(S3FileSystem.java:141)
    at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1126)
    at org.apache.hadoop.hive.ql.Context.getScratchDir(Context.java:165)
    at org.apache.hadoop.hive.ql.Context.getExternalScratchDir(Context.java:222)
    at org.apache.hadoop.hive.ql.Context.getExternalTmpFileURI(Context.java:315)
    at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:4049)
    at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:6205)
    at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:6136)
    at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6762)
    at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7531)
    at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:689)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: org.jets3t.service.S3ServiceException: S3 GET failed for '/10.226.118.113/%2F' XML Error Message: <?xml version="1.0"     encoding="UTF-8"?><Error><Code>NoSuchBucket</Code><Message>The specified bucket does not exist</Message><BucketName>10.226.118.113<    /BucketName><RequestId>FFFFBCE9711A91AE</RequestId><HostId>kXu2oMblsYKD+Jx9O5fTbjosOtTNNtyM+lbE2pmCC63Wm3abJxMvanHdSCYnUyaC</HostId></Error>
    at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest(RestS3Service.java:416)
    at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRestGet(RestS3Service.java:752)
    at org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectImpl(RestS3Service.java:1601)
    at org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectImpl(RestS3Service.java:1544)
    at org.jets3t.service.S3Service.getObject(S3Service.java:2072)
    at org.jets3t.service.S3Service.getObject(S3Service.java:1310)
    at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:144)
... 33 more

最新的 DSE 是否支持 Hive S3 连接器?或者我可能做错了什么?

【问题讨论】:

  • 您的存储桶名称实际上是10.226.118.113 吗?
  • 否,因为它在CREATE TABLE 中显示,存储桶名称是com.mydomain.events10.226.118.113 是我正在执行命令的节点的 IP 地址。
  • 您不需要在查询中指定存储桶吗?它显然是在使用你的 IP 地址作为存储桶名称。
  • 我找不到在查询中指定存储桶名称的任何示例。根据我的发现,存储桶名称仅在创建表时指定。
  • 检查我的答案,希望它有效。

标签: hadoop amazon-s3 cassandra hive datastax-enterprise


【解决方案1】:

在您的 Hive 安装中尝试以下操作:

hive-site.xml

<property>
  <name>fs.default.name</name>
  <value>s3n://your-bucket</value>
</property>

core-site.xml

<property>
  <name>fs.s3n.awsAccessKeyId</name>
  <value>Your AWS Key</value>
</property>

<property>
  <name>fs.s3n.awsSecretAccessKey</name>
  <value>Your AWS Secret Key</value>
</property>

这是根据 3.1 文档:http://www.datastax.com/docs/datastax_enterprise3.1/solutions/about_hive

下:

在 Hive 中使用外部文件系统

在 3.2 文档中没有看到它。不知道为什么他们省略了它,但看起来对于您在 S3 上运行 Hive 来说是必不可少的

【讨论】:

  • 这似乎对我有用,我在那里有访问密钥我只是没有将默认名称设置为s3n://my-bucket/。它已经设置为cfs://local-ip/,所以我想知道这是否会导致一些尚未引起注意的问题。
【解决方案2】:

S3 文件系统的 Hadoop 实现已过时,因此从 hive 向 S3 写入数据效果不佳。我们解决了阅读问题。现在 DSE 可以读取 S3 文件,但写入有问题。我们会检查它,看看我们是否可以尽快修复它

【讨论】:

  • 将数据写入 s3 究竟存在哪些问题?
猜你喜欢
  • 2014-01-17
  • 2011-04-15
  • 2015-08-25
  • 2010-11-23
  • 2016-11-05
  • 2014-09-12
  • 1970-01-01
  • 2016-01-19
  • 2015-01-09
相关资源
最近更新 更多