【发布时间】:2018-11-14 19:36:42
【问题描述】:
我制作了一个 spark 代码,用于复制文件夹并将其放入 Amazon S3 独立存储桶。该过程运行良好,但现在我尝试将相同的过程用于在 Scality 上运行的 Amazon S3 存储桶。这是我的配置。
spark-submit --name "Backup S3 Test" --master yarn-cluster --executor-memory 2048m --num-executors 6 --executor-cores 2 --driver-memory 1024m --keytab /home/bigdata/userbcks3.keytab
--principal XXXXXXX@XXXXXXXX
--deploy-mode cluster
--conf spark.file.replicate.exclusion.regexps=""
--conf spark.hadoop.fs.s3a.access.key=XXXXXXXXXX
--conf spark.hadoop.fs.s3a.secret.key=XXXXXXXXXX
--class com.keedio.hadoop.FileReplicator hdfs-file-processors-1.1.6-SNAPSHOT.jar /pre/mydata/ s3a://mybucket/
现在例外
om.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4221)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4168)
at com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1306)
at com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:1263)
at org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(S3AFileSystem.java:323)
... 20 more
Caused by: com.amazonaws.SdkClientException: The requested metadata is not found at http://169.254.169.254/latest/meta-data/iam/security-credentials/
at com.amazonaws.internal.EC2CredentialsUtils.readResource(EC2CredentialsUtils.java:115)
at com.amazonaws.internal.EC2CredentialsUtils.readResource(EC2CredentialsUtils.java:77)
at com.amazonaws.auth.InstanceProfileCredentialsProvider$InstanceMetadataCredentialsEndpointProvider.getCredentialsEndpoint(InstanceProfileCredentialsProvider.java:156)
at com.amazonaws.auth.EC2CredentialsFetcher.fetchCredentials(EC2CredentialsFetcher.java:121)
at com.amazonaws.auth.EC2CredentialsFetcher.getCredentials(EC2CredentialsFetcher.java:82)
at com.amazonaws.auth.InstanceProfileCredentialsProvider.getCredentials(InstanceProfileCredentialsProvider.java:141)
at org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:129)
为了制作副本,我只是使用 apache FileUtils,它允许我在 DistributedFileSystem 和 S3AFileSystem 之间移动文件。 有什么办法可以让它在相同的过程中工作?也许我缺少任何配置参数?
【问题讨论】:
标签: apache-spark hadoop amazon-s3