无法在 HDP 2.0 上运行 Spark 1.0 SparkPi答案

【问题标题】：Unable to run Spark 1.0 SparkPi on HDP 2.0无法在 HDP 2.0 上运行 Spark 1.0 SparkPi
【发布时间】：2014-08-25 20:10:59
【问题描述】：

我遇到了在 HDP 2.0 上运行 spark PI 示例的问题

我从 http://spark.apache.org/downloads.html 下载了 spark 1.0 pre-build（用于 HDP2） spark 网站的运行示例：

 ./bin/spark-submit --class org.apache.spark.examples.SparkPi     --master yarn-cluster --num-executors 3 --driver-memory 2g --executor-memory 2g --executor-cores 1 ./lib/spark-examples-1.0.0-hadoop2.2.0.jar 2

我收到错误：

应用程序 application_1404470405736_0044 由于 AM 失败 3 次 appattempt_1404470405736_0044_000003 的容器退出 exitCode：1 由于：容器启动异常： org.apache.hadoop.util.Shell$ExitCodeException：在 org.apache.hadoop.util.Shell.runCommand(Shell.java:464) 在 org.apache.hadoop.util.Shell.run(Shell.java:379) 在 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) 在 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) 在 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283) 在 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79) 在 java.util.concurrent.FutureTask.run(FutureTask.java:262) 在 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 在 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 在 java.lang.Thread.run(Thread.java:744) 。这次尝试失败.. 申请失败。

未知/不支持的参数列表(--executor-memory, 2048, --executor-cores, 1, --num-executors, 3) 用法：org.apache.spark.deploy.yarn.ApplicationMaster [options] 选项：
--jar JAR_PATH 应用程序 JAR 文件的路径（必需） --class CLASS_NAME 应用程序主类的名称（必需） ...bla-bla-bla

有什么想法吗？我怎样才能让它工作？

【问题讨论】：

我认为很明显你没有正确传递参数，Unknown/unsupported param List(--executor-memory, 2048, --executor-cores, 1, --num-executors, 3) 我建议查看你用...bla-bla-bla 缩短的Options

标签： hadoop apache-spark hortonworks-data-platform

【解决方案1】：

我遇到了同样的问题。原因是那个版本的 spark-assembly.jar，在 hdfs 与您当前的 spark 版本不同。

例如 org.apache.spark.deploy.yarn.Client 在 hdfs 版本中的参数列表：

  $ hadoop jar ./spark-assembly.jar  org.apache.spark.deploy.yarn.Client --help
Usage: org.apache.spark.deploy.yarn.Client [options] 
Options:
  --jar JAR_PATH             Path to your application's JAR file (required in yarn-cluster mode)
  --class CLASS_NAME         Name of your application's main class (required)
  --args ARGS                Arguments to be passed to your application's main class.
                             Mutliple invocations are possible, each will be passed in order.
  --num-workers NUM          Number of workers to start (Default: 2)
  --worker-cores NUM         Number of cores for the workers (Default: 1). This is unsused right now.
  --master-class CLASS_NAME  Class Name for Master (Default: spark.deploy.yarn.ApplicationMaster)
  --master-memory MEM        Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb)
  --worker-memory MEM        Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
  --name NAME                The name of your application (Default: Spark)
  --queue QUEUE              The hadoop queue to use for allocation requests (Default: 'default')
  --addJars jars             Comma separated list of local jars that want SparkContext.addJar to work with.
  --files files              Comma separated list of files to be distributed with the job.
  --archives archives        Comma separated list of archives to be distributed with the job.

对于最新安装的 spark-assembly jar 文件的帮助相同：

$ hadoop jar ./spark-assembly-1.0.0-cdh5.1.0-hadoop2.3.0-cdh5.1.0.jar org.apache.spark.deploy.yarn.Client
Usage: org.apache.spark.deploy.yarn.Client [options] 
Options:
  --jar JAR_PATH             Path to your application's JAR file (required in yarn-cluster mode)
  --class CLASS_NAME         Name of your application's main class (required)
  --arg ARGS                 Argument to be passed to your application's main class.
                             Multiple invocations are possible, each will be passed in order.
  --num-executors NUM        Number of executors to start (Default: 2)
  --executor-cores NUM       Number of cores for the executors (Default: 1).
  --driver-memory MEM        Memory for driver (e.g. 1000M, 2G) (Default: 512 Mb)
  --executor-memory MEM      Memory per executor (e.g. 1000M, 2G) (Default: 1G)
  --name NAME                The name of your application (Default: Spark)
  --queue QUEUE              The hadoop queue to use for allocation requests (Default: 'default')
  --addJars jars             Comma separated list of local jars that want SparkContext.addJar to work with.
  --files files              Comma separated list of files to be distributed with the job.
  --archives archives        Comma separated list of archives to be distributed with the job.

所以，我将 spark-assembly.jar 更新为 hdfs，spark 开始正常工作

【讨论】：