【发布时间】:2016-04-15 12:23:31
【问题描述】:
有人在法兰克福使用 s3 使用 hadoop/spark 1.6.0 吗?
我正在尝试将作业的结果存储在 s3 上,我的依赖项声明如下:
"org.apache.spark" %% "spark-core" % "1.6.0" exclude("org.apache.hadoop", "hadoop-client"),
"org.apache.spark" %% "spark-sql" % "1.6.0",
"org.apache.hadoop" % "hadoop-client" % "2.7.2",
"org.apache.hadoop" % "hadoop-aws" % "2.7.2"
我设置了以下配置:
System.setProperty("com.amazonaws.services.s3.enableV4", "true")
sc.hadoopConfiguration.set("fs.s3a.endpoint", ""s3.eu-central-1.amazonaws.com")
在我的 RDD 上调用 saveAsTextFile 时,它开始正常,将所有内容保存在 S3 上。然而,当它从_temporary 传输到最终输出结果一段时间后,它会产生错误:
Exception in thread "main" com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: XXXXXXXXXXXXXXXX, AWS Error Code: SignatureDoesNotMatch, AWS Error Message: The request signature we calculated does not match the signature you provided. Check your key and signing method., S3 Extended Request ID: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX=
at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
at com.amazonaws.services.s3.AmazonS3Client.copyObject(AmazonS3Client.java:1507)
at com.amazonaws.services.s3.transfer.internal.CopyCallable.copyInOneChunk(CopyCallable.java:143)
at com.amazonaws.services.s3.transfer.internal.CopyCallable.call(CopyCallable.java:131)
at com.amazonaws.services.s3.transfer.internal.CopyMonitor.copy(CopyMonitor.java:189)
at com.amazonaws.services.s3.transfer.internal.CopyMonitor.call(CopyMonitor.java:134)
at com.amazonaws.services.s3.transfer.internal.CopyMonitor.call(CopyMonitor.java:46)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
如果我使用 spark 包中的hadoop-client,它甚至不会开始传输。错误随机发生,有时有效,有时无效。
【问题讨论】:
-
您的 ssh 密钥似乎有问题。您能检查一下您使用的密钥是否正确吗?
-
数据开始在s3上保存,一段时间后出现错误。
-
@flaviotruzzi 你解决了这个问题吗?
-
@pangpang 使用自定义函数来保存数据。
-
@flaviotruzzi - 你说这是一个 dup 并链接回这个页面......?
标签: scala hadoop amazon-s3 apache-spark