【问题标题】:Application failed 2 times due to AM Container, exited with exitcode -104由于 AM Container,应用程序失败 2 次,退出代码为 -104
【发布时间】:2019-09-25 14:12:08
【问题描述】:

我正在运行一个带有两个输入文件和一个从 Amazon S3 存储桶中获取的 jar 文件的 Spark 应用程序。我正在使用 AWS CLI 创建一个集群,其中 instance typem5.12xlargeinstance-count11 和 spark 属性为:

--deploy-mode cluster
--num-executors 10
--executor-cores 45
--executor-memory 155g

我的 spark 作业运行了一段时间,然后失败并自动重新启动,然后又运行了一段时间,然后显示了此诊断信息(从日志中提取)

diagnostics: Application application_1557259242251_0001 failed 2 times due to AM Container for appattempt_1557259242251_0001_000002 exited with  exitCode: -104
Failing this attempt.Diagnostics: Container [pid=11779,containerID=container_1557259242251_0001_02_000001] is running beyond physical memory limits. Current usage: 1.4 GB of 1.4 GB physical memory used; 3.5 GB of 6.9 GB virtual memory used. Killing container.
Dump of the process-tree for container_1557259242251_0001_02_000001 :
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

Exception in thread "main" org.apache.spark.SparkException: Application application_1557259242251_0001 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1165)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1520)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
19/05/07 20:03:35 INFO ShutdownHookManager: Shutdown hook called
19/05/07 20:03:35 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-3deea823-45e5-4a11-a5ff-833b01e6ae79
19/05/07 20:03:35 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-d6c3f8b2-34c6-422b-b946-ad03b1ee77d6
Command exiting with ret '1'

我无法弄清楚是什么问题?

我已尝试更改实例类型或降低执行器内存和执行器核心,但仍然出现相同的问题。 有时相同的配置设置会成功终止集群并生成结果,但很多时候会生成这些错误。

有人可以帮忙吗?

【问题讨论】:

    标签: apache-spark amazon-ec2 hadoop-yarn amazon-emr


    【解决方案1】:

    如果您向 spark 作业提供超过 1 个输入文件。制作一个jar,然后执行。

    第 1 步:如何制作 zip 文件

    zip abc.zip file1.py file2.py
    

    第 2 步:使用 zip 文件执行作业

    spark2-submit --master yarn --deploy-mode cluster --py-files /home/abc.zip /home/main_program_file.py
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2023-03-16
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多