【问题标题】:Unable to run Spark 1.0 SparkPi on HDP 2.0无法在 HDP 2.0 上运行 Spark 1.0 SparkPi
【发布时间】:2014-08-25 20:10:59
【问题描述】:

我遇到了在 HDP 2.0 上运行 spark PI 示例的问题

我从 http://spark.apache.org/downloads.html 下载了 spark 1.0 pre-build(用于 HDP2) spark 网站的运行示例:

 ./bin/spark-submit --class org.apache.spark.examples.SparkPi     --master yarn-cluster --num-executors 3 --driver-memory 2g --executor-memory 2g --executor-cores 1 ./lib/spark-examples-1.0.0-hadoop2.2.0.jar 2

我收到错误:

应用程序 application_1404470405736_0044 由于 AM 失败 3 次 appattempt_1404470405736_0044_000003 的容器退出 exitCode:1 由于:容器启动异常: org.apache.hadoop.util.Shell$ExitCodeException:在 org.apache.hadoop.util.Shell.runCommand(Shell.java:464) 在 org.apache.hadoop.util.Shell.run(Shell.java:379) 在 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) 在 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) 在 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283) 在 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79) 在 java.util.concurrent.FutureTask.run(FutureTask.java:262) 在 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 在 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 在 java.lang.Thread.run(Thread.java:744) 。这次尝试失败.. 申请失败。

未知/不支持的参数列表(--executor-memory, 2048, --executor-cores, 1, --num-executors, 3) 用法:org.apache.spark.deploy.yarn.ApplicationMaster [options] 选项:
--jar JAR_PATH 应用程序 JAR 文件的路径(必需) --class CLASS_NAME 应用程序主类的名称(必需) ...bla-bla-bla

有什么想法吗?我怎样才能让它工作?

【问题讨论】:

  • 我认为很明显你没有正确传递参数,Unknown/unsupported param List(--executor-memory, 2048, --executor-cores, 1, --num-executors, 3) 我建议查看你用...bla-bla-bla 缩短的Options

标签: hadoop apache-spark hortonworks-data-platform


【解决方案1】:

我遇到了同样的问题。 原因是那个版本的 spark-assembly.jar,在 hdfs 与您当前的 spark 版本不同。

例如 org.apache.spark.deploy.yarn.Client 在 hdfs 版本中的参数列表:

  $ hadoop jar ./spark-assembly.jar  org.apache.spark.deploy.yarn.Client --help
Usage: org.apache.spark.deploy.yarn.Client [options] 
Options:
  --jar JAR_PATH             Path to your application's JAR file (required in yarn-cluster mode)
  --class CLASS_NAME         Name of your application's main class (required)
  --args ARGS                Arguments to be passed to your application's main class.
                             Mutliple invocations are possible, each will be passed in order.
  --num-workers NUM          Number of workers to start (Default: 2)
  --worker-cores NUM         Number of cores for the workers (Default: 1). This is unsused right now.
  --master-class CLASS_NAME  Class Name for Master (Default: spark.deploy.yarn.ApplicationMaster)
  --master-memory MEM        Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb)
  --worker-memory MEM        Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
  --name NAME                The name of your application (Default: Spark)
  --queue QUEUE              The hadoop queue to use for allocation requests (Default: 'default')
  --addJars jars             Comma separated list of local jars that want SparkContext.addJar to work with.
  --files files              Comma separated list of files to be distributed with the job.
  --archives archives        Comma separated list of archives to be distributed with the job.

对于最新安装的 spark-assembly jar 文件的帮助相同:

$ hadoop jar ./spark-assembly-1.0.0-cdh5.1.0-hadoop2.3.0-cdh5.1.0.jar org.apache.spark.deploy.yarn.Client
Usage: org.apache.spark.deploy.yarn.Client [options] 
Options:
  --jar JAR_PATH             Path to your application's JAR file (required in yarn-cluster mode)
  --class CLASS_NAME         Name of your application's main class (required)
  --arg ARGS                 Argument to be passed to your application's main class.
                             Multiple invocations are possible, each will be passed in order.
  --num-executors NUM        Number of executors to start (Default: 2)
  --executor-cores NUM       Number of cores for the executors (Default: 1).
  --driver-memory MEM        Memory for driver (e.g. 1000M, 2G) (Default: 512 Mb)
  --executor-memory MEM      Memory per executor (e.g. 1000M, 2G) (Default: 1G)
  --name NAME                The name of your application (Default: Spark)
  --queue QUEUE              The hadoop queue to use for allocation requests (Default: 'default')
  --addJars jars             Comma separated list of local jars that want SparkContext.addJar to work with.
  --files files              Comma separated list of files to be distributed with the job.
  --archives archives        Comma separated list of archives to be distributed with the job.

所以,我将 spark-assembly.jar 更新为 hdfs,spark 开始正常工作

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2017-07-19
    • 1970-01-01
    • 1970-01-01
    • 2020-11-16
    • 2018-05-23
    • 2017-06-01
    • 2015-04-07
    • 1970-01-01
    相关资源
    最近更新 更多