【发布时间】:2018-11-02 12:58:35
【问题描述】:
我无法从 Flink 作业访问 S3。
如果我为我的工作提交组装的 jar,我会收到拒绝访问错误:
Caused by:
com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception:
Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: ...; S3 Extended Request ID: ...), S3 Extended Request ID: ...
这是我的设置: 使用“高级配置”创建的 EMR 集群, Flink 1.4.0。和 Hadoop 2.8.3。作为应用程序。 1 个主节点,2 个节点
实例具有EMR_EC2_DefaultRole,具有策略 AmazonElasticMapReduceforEC2Role,具有 S3 完全访问权限。
事实上,我可以在主节点和从节点上成功发出这个命令:
aws s3api list-buckets
hdfs dfs -ls s3://bucketA
我连接到主服务器并启动一个集群:
/usr/lib/flink/bin/yarn-session.sh -n 2 -d
Flink 作业从作为源的存储桶中读取:
object TestS3 {
def main(args: Array[String]): Unit = {
val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
val input: DataSet[String] = env.readTextFile("s3://bucketA/source/file")
input.writeAsText("s3://bucketB/delete/me/later")
env.execute()
}
}
这是我的简单 build.sbt:
name := "TestS3"
scalaVersion := "2.11.11"
version := "0.1"
val flinkVersion = "1.4.0"
libraryDependencies ++= Seq(
"org.apache.flink" % "flink-core" % flinkVersion % "provided",
"org.apache.flink" %% "flink-scala" % flinkVersion % "provided"
)
存储桶没有拒绝读取访问的策略。它有拒绝删除的策略,但这不应该影响 Flink 作业。
EMR_EC2_Default_Role 授予对 S3 的完全访问权限。
与往常一样,非常感谢任何关于我做错的提示。或者也许我的期望这应该起作用是错误的?!
这是完整的堆栈跟踪:
java.io.IOException: Error opening the Input Split s3://bucketA/source/file [0,-1]: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: xyz_request_id; S3 Extended Request ID: xyz_ext_request_id), S3 Extended Request ID: xyz_ext_request_id
at org.apache.flink.api.common.io.FileInputFormat.open(FileInputFormat.java:705)
at org.apache.flink.api.common.io.DelimitedInputFormat.open(DelimitedInputFormat.java:477)
at org.apache.flink.api.common.io.DelimitedInputFormat.open(DelimitedInputFormat.java:48)
at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:145)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: xyz_request_id; S3 Extended Request ID: xyz_ext_request_id), S3 Extended Request ID: xyz_ext_request_id
at com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.handleAmazonServiceException(Jets3tNativeFileSystemStore.java:434)
at com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.retrievePair(Jets3tNativeFileSystemStore.java:461)
at com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.retrievePair(Jets3tNativeFileSystemStore.java:439)
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.open(S3NativeFileSystem.java:1097)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:790)
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.open(EmrFileSystem.java:166)
at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.open(HadoopFileSystem.java:119)
at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.open(HadoopFileSystem.java:36)
at org.apache.flink.api.common.io.FileInputFormat$InputSplitOpenThread.run(FileInputFormat.java:865)
Caused by: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: xyz_request_id; S3 Extended Request ID: xyz_ext_request_id), S3 Extended Request ID: xyz_ext_request_id
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1639)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1304)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4325)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4272)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1409)
at com.amazon.ws.emr.hadoop.fs.s3.lite.call.GetObjectCall.perform(GetObjectCall.java:22)
at com.amazon.ws.emr.hadoop.fs.s3.lite.call.GetObjectCall.perform(GetObjectCall.java:9)
at com.amazon.ws.emr.hadoop.fs.s3.lite.executor.GlobalS3Executor.execute(GlobalS3Executor.java:91)
at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.invoke(AmazonS3LiteClient.java:176)
at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.getObject(AmazonS3LiteClient.java:99)
at com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.retrievePair(Jets3tNativeFileSystemStore.java:452)
... 7 more
【问题讨论】:
标签: amazon-s3 apache-flink amazon-emr