【发布时间】:2021-05-27 00:39:18
【问题描述】:
我正在尝试单节点 yarn cluster 中 flink docs 中解释的 flink 示例。
正如这里提到的discussion HADOOP_CONF_DIR 在执行纱线命令之前也设置如下。
export HADOOP_CONF_DIR=/etc/hadoop/conf
在执行以下命令时
ubuntu@vrni-platform:~/build-target/flink$ ./bin/flink run-application -t yarn-application ./examples/streaming/TopSpeedWindowing.jar
由于以下错误而失败
The program finished with the following exception:
org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't deploy Yarn Application Cluster
at org.apache.flink.yarn.YarnClusterDescriptor.deployApplicationCluster(YarnClusterDescriptor.java:465)
at org.apache.flink.client.deployment.application.cli.ApplicationClusterDeployer.run(ApplicationClusterDeployer.java:67)
at org.apache.flink.client.cli.CliFrontend.runApplication(CliFrontend.java:213)
at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1061)
at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1136)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1136)
Caused by: org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment.
Diagnostics from YARN: Application application_1614159836384_0045 failed 1 times (global limit =2; local limit is =1) due to AM Container for appattempt_1614159836384_0045_000001 exited with exitCode: -1000
Failing this attempt.Diagnostics: [2021-02-24 16:19:39.409]File file:/home/ubuntu/.flink/application_1614159836384_0045/flink-dist_2.12-1.12.1.jar does not exist
java.io.FileNotFoundException: File file:/home/ubuntu/.flink/application_1614159836384_0045/flink-dist_2.12-1.12.1.jar does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:867)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:269)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:67)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:414)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:411)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:411)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:242)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:235)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:223)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
我已将日志级别设为 DEBUG,并且确实看到 flink-dist_2.12-1.12.1.jar 被复制到 /home/ubuntu/.flink/application_1614159836384_0045。
2021-02-24 16:19:37,768 DEBUG org.apache.flink.yarn.YarnApplicationFileUploader [] - Got modification time 1614183577000 from remote path file:/home/ubuntu/.flink/application_1614159836384_0045/TopSpeedWindowing.jar
2021-02-24 16:19:37,769 DEBUG org.apache.flink.yarn.YarnApplicationFileUploader [] - Copying from file:/home/ubuntu/build-target/flink/lib/flink-dist_2.12-1.12.1.jar to file:/home/ubuntu/.flink/application_1614159836384_0045/flink-dist_2.12-1.12.1.jar with replication factor 1
我已将整个 DEBUG 日志放在here。
Nodemanger 日志有如下警告
2021-02-24 16:36:34,219 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_INIT for appId application_1614159836384_0047
2021-02-24 16:36:34,220 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Created localizer for container_1614159836384_0047_01_000001
2021-02-24 16:36:34,222 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Writing credentials to the nmPrivate file /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/nmPrivate/container_1614159836384_0047_01_000001.tokens
2021-02-24 16:36:34,222 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user ubuntu
2021-02-24 16:36:34,224 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying from /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/nmPrivate/container_1614159836384_0047_01_000001.tokens to /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/ubuntu/appcache/application_1614159836384_0047/container_1614159836384_0047_01_000001.tokens
2021-02-24 16:36:34,224 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Localizer CWD set to /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/ubuntu/appcache/application_1614159836384_0047 = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/ubuntu/appcache/application_1614159836384_0047
2021-02-24 16:36:34,247 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer: Disk Validator: yarn.nodemanager.disk-validator is loaded.
2021-02-24 16:36:34,268 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: { file:/home/ubuntu/.flink/application_1614159836384_0047/flink-dist_2.12-1.12.1.jar, 1614184593000, FILE, null } failed: File file:/home/ubuntu/.flink/application_1614159836384_0047/flink-dist_2.12-1.12.1.jar does not exist
java.io.FileNotFoundException: File file:/home/ubuntu/.flink/application_1614159836384_0047/flink-dist_2.12-1.12.1.jar does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:867)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:269)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:67)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:414)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:411)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:411)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:242)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:235)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:223)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
整个节点管理器日志为here。
有人可以告诉我出了什么问题吗? flink不支持单节点yarn集群开发吗?
- Flink 版本 1.12.1
【问题讨论】:
-
您是否尝试过在不使用纱线的情况下运行测试 - 以验证您的 Flink 应用程序配置是否正确?
-
@moosehead42 我正在使用 flink 发行版中示例 (
TopSpeedWindowing.jar) 文件夹下提供的应用程序。是的,只有单节点纱线有问题。 -
那么这意味着
flink run \path\to\TopSpeedWindowing.jar执行正确? -
application_1614159836384_0045 != application_1614159836384_0047: 是你的问题中的错字还是找到根本原因的线索? -
@GuillaumeVauvert 是的,这是一个错字。