【问题标题】:Running Spark on m4 instead of m3 on AWS在 m4 上运行 Spark 而不是在 AWS 上运行 m3
【发布时间】:2016-12-28 18:27:42
【问题描述】:

我有一个小脚本,用于通过 AWS 提交作业。我已将实例类型从 m3xlarge 更改为 m4.xlarge,但我突然收到一条错误消息,集群在未完成所有步骤的情况下终止。脚本是:

aws emr create-cluster --name “XXXXXX”  --ami-version 3.7 --applications Name=Hive --use-default-roles --ec2-attributes KeyName=gattami,SubnetId=subnet-xxxxxxx \
--instance-type=m4.xlarge --instance-count 3 \
--log-uri s3://pythonpicode/ --bootstrap-actions Path=s3://eu-central-1.support.elasticmapreduce/spark/install-spark,Name=Spark,Args=[-x] --steps Name=“PythonPi”,Jar=s3://eu-central-1.elasticmapreduce/libs/script-runner/script-runner.jar,Args=[/home/hadoop/spark/bin/spark-submit,--deploy-mode,cluster,--master,yarn,--class,s3://pythonpicode/,s3://pythonpicode/PythonPi.py],ActionOnFailure=CONTINUE --auto-terminate

我得到的错误信息是

Exception in thread "main" java.lang.IllegalArgumentException: Unknown/unsupported param List(--executor-cores, , --files, s3://pythonpicode/PythonPi.py, --primary-py-file, PythonPi.py, --class, org.apache.spark.deploy.PythonRunner)

Usage: org.apache.spark.deploy.yarn.Client [options]
Options:
  --jar JAR_PATH           Path to your application's JAR file (required in yarn-cluster
                           mode)
  --class CLASS_NAME       Name of your application's main class (required)
  --primary-py-file        A main Python file
  --arg ARG                Argument to be passed to your application's main class.
                           Multiple invocations are possible, each will be passed in order.
  --num-executors NUM      Number of executors to start (Default: 2)
  --executor-cores NUM     Number of cores per executor (Default: 1).
  --driver-memory MEM      Memory for driver (e.g. 1000M, 2G) (Default: 512 Mb)
  --driver-cores NUM       Number of cores used by the driver (Default: 1).
  --executor-memory MEM    Memory per executor (e.g. 1000M, 2G) (Default: 1G)
  --name NAME              The name of your application (Default: Spark)
  --queue QUEUE            The hadoop queue to use for allocation requests (Default:
                           'default')
  --addJars jars           Comma separated list of local jars that want SparkContext.addJar
                           to work with.
  --py-files PY_FILES      Comma-separated list of .zip, .egg, or .py files to
                           place on the PYTHONPATH for Python apps.
  --files files            Comma separated list of files to be distributed with the job.
  --archives archives      Comma separated list of archives to be distributed with the job.

    at org.apache.spark.deploy.yarn.ClientArguments.parseArgs(ClientArguments.scala:228)
    at org.apache.spark.deploy.yarn.ClientArguments.<init>(ClientArguments.scala:56)
    at org.apache.spark.deploy.yarn.Client$.main(Client.scala:646)
    at org.apache.spark.deploy.yarn.Client.main(Client.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 
Command exiting with ret ‘1'

我也尝试过以下替代方法

aws emr create-cluster --name "XXXXXXX"  --release-label emr-4.7.2 --applications Name=Spark --ec2-attributes KeyName=xxxxxxx,SubnetId=subnet-xxxxxxxx \
--instance-type=m4.xlarge  --instance-count 3 \
--log-uri s3://pythonpicode/ --steps Type=CUSTOM_JAR,Name="PythonPi",Jar="command-runner.jar",ActionOnFailure=CONTINUE,Args=[spark-submit,--master,yarn,--deploy-mode,cluster,s3://pythonpicode/PythonPi.py] --use-default-roles --auto-terminate

我从这些步骤中得到的(部分)错误消息如下

16/08/24 11:57:39 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:40 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:41 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:42 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:43 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:44 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:45 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:46 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:47 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:48 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:49 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:50 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:51 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:52 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:53 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:54 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:55 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:56 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:57 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:58 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:57:59 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:58:00 INFO Client: Application report for application_1472039667248_0001 (state: RUNNING)
16/08/24 11:58:01 INFO Client: Application report for application_1472039667248_0001 (state: FAILED)
16/08/24 11:58:01 INFO Client: 
     client token: N/A
     diagnostics: Application application_1472039667248_0001 failed 2 times due to AM Container for appattempt_1472039667248_0001_000002 exited with  exitCode: -104
For more detailed output, check application tracking page:http://ip-172-31-21-32.eu-central-1.compute.internal:8088/cluster/app/application_1472039667248_0001Then, click on links to logs of each attempt.
Diagnostics: Container [pid=5713,containerID=container_1472039667248_0001_02_000001] is running beyond physical memory limits. Current usage: 2.0 GB of 1.4 GB physical memory used; 3.3 GB of 6.9 GB virtual memory used. Killing container.
Dump of the process-tree for container_1472039667248_0001_02_000001 :
    |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
    |- 5748 5721 5713 5713 (python) 301 29 1343983616 246463 python PythonPi.py 
    |- 5721 5713 5713 5713 (java) 1594 93 2031308800 265175 /usr/lib/jvm/java-openjdk/bin/java -server -Xmx1024m -Djava.io.tmpdir=/mnt/yarn/usercache/hadoop/appcache/application_1472039667248_0001/container_1472039667248_0001_02_000001/tmp -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError=kill -9 %p -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1472039667248_0001/container_1472039667248_0001_02_000001 -XX:MaxPermSize=256m org.apache.spark.deploy.yarn.ApplicationMaster --class org.apache.spark.deploy.PythonRunner --primary-py-file PythonPi.py --executor-memory 5120m --executor-cores 4 --properties-file /mnt/yarn/usercache/hadoop/appcache/application_1472039667248_0001/container_1472039667248_0001_02_000001/__spark_conf__/__spark_conf__.properties 
    |- 5713 5711 5713 5713 (bash) 0 0 115810304 715 /bin/bash -c LD_LIBRARY_PATH=/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native /usr/lib/jvm/java-openjdk/bin/java -server -Xmx1024m -Djava.io.tmpdir=/mnt/yarn/usercache/hadoop/appcache/application_1472039667248_0001/container_1472039667248_0001_02_000001/tmp '-XX:+UseConcMarkSweepGC' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1472039667248_0001/container_1472039667248_0001_02_000001 -XX:MaxPermSize=256m org.apache.spark.deploy.yarn.ApplicationMaster --class 'org.apache.spark.deploy.PythonRunner' --primary-py-file PythonPi.py --executor-memory 5120m --executor-cores 4 --properties-file /mnt/yarn/usercache/hadoop/appcache/application_1472039667248_0001/container_1472039667248_0001_02_000001/__spark_conf__/__spark_conf__.properties 1> /var/log/hadoop-yarn/containers/application_1472039667248_0001/container_1472039667248_0001_02_000001/stdout 2> /var/log/hadoop-yarn/containers/application_1472039667248_0001/container_1472039667248_0001_02_000001/stderr 

Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Failing this attempt. Failing the application.
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1472039815698
     final status: FAILED
     tracking URL: http://ip-172-31-21-32.eu-central-1.compute.internal:8088/cluster/app/application_1472039667248_0001
     user: hadoop
Exception in thread "main" org.apache.spark.SparkException: Application application_1472039667248_0001 finished with failed status
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1034)
    at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
    at org.apache.spark.deploy.yarn.Client.main(Client.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/08/24 11:58:01 INFO ShutdownHookManager: Shutdown hook called
16/08/24 11:58:01 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-7adbbd9f-2f68-49e3-85e6-9fdf960af87e
Command exiting with ret '1'

【问题讨论】:

  • 您的steps 定义中没有--executor-cores,这是您的错误消息所抱怨的。你发这个问题的时候把它拿出来了吗?
  • 您认为缺少什么?我在 m3.xlarge 中使用了相同的脚本,它运行良好。
  • 您确定您的 了吗?它们看起来不对。

标签: amazon-web-services amazon-s3 amazon-ec2 pyspark


【解决方案1】:

您需要检查您的 Spark 版本。您很可能安装了不支持这些参数的旧版本(例如 1.5)。

(--executor-cores, , --files, s3://pythonpicode/PythonPi.py, --primary-py-file, PythonPi.py, --class, org.apache.spark.deploy.PythonRunner)

我建议您尝试使用稳定的 AMI 4.7.2,并将 Spark 1.6 作为标准应用程序提供。

【讨论】:

  • 您好帅远,感谢您的快速回复。您确定有 AMI 版本 4.7.2 吗? 3.11 是我可以检查的最新版本。当我将 AMI 版本设置为 4.7.2 时,我收到以下错误消息:“调用 RunJobFlow 操作时发生错误 (ValidationException):提供的 ami 版本无效。”
  • 这太奇怪了。检查this
  • 嗯,有一个 EMR 版本 4.7.2,您可能提到过。但是,这不支持 Spark。这是我得到的脚本和错误消息:
  • aws emr create-cluster --name "XXXXX" --release-label emr-4.7.2 --applications Name=Hive --use-default-roles --ec2-attributes KeyName=gattami ,SubnetId=subnet-xxxxxxx \ etc...
  • 调用 RunJobFlow 操作时发生错误 (ValidationException):提供的引导操作:“emr-4.7.2”不支持“Spark”。
猜你喜欢
  • 2023-03-16
  • 1970-01-01
  • 2019-07-27
  • 1970-01-01
  • 2016-05-14
  • 2021-08-18
  • 2015-07-15
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多