【问题标题】:Spark on Emr and job (jar) submission from the master node:从主节点提交 Emr 和作业(jar)上的 Spark:
【发布时间】:2015-07-31 20:53:34
【问题描述】:

所以我正在(或尝试)从 aws 上的 EMR 集群的主节点运行(或尝试)编译的(胖 jar)spark/scala 程序。我已经在我的开发环境中编译了与我的产品环境相同的依赖项的 jar。我正在使用 spark-submit 脚本进行部署:

SPARK_JAR=./spark/lib/spark-assembly-1.2.1-hadoop2.4.0.jar \
./spark-submit \
--deploy-mode cluster \
--verbose \
--master yarn-cluster \
--class sparkSQLProcessor \
--driver-memory 1g \
--executor-memory 1g \
--executor-cores 1 \
--num-executors 1 \
/home/hadoop/Spark-SQL-Job.jar args1 args2

我遇到的问题是我遇到了这个配置问题:(或者我认为是)

Exception in thread "main" java.io.FileNotFoundException: File file:/home/hadoop/.versions/spark-1.2.1.a/bin/spark/lib/spark-assembly-1.2.1-hadoop2.4.0.jar does not exist
    at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:516)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:729)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:506)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:407)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
    at org.apache.spark.deploy.yarn.ClientBase$class.copyFileToRemote(ClientBase.scala:102)
    at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:35)
    at org.apache.spark.deploy.yarn.ClientBase$$anonfun$prepareLocalResources$3.apply(ClientBase.scala:182)
    at org.apache.spark.deploy.yarn.ClientBase$$anonfun$prepareLocalResources$3.apply(ClientBase.scala:176)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at org.apache.spark.deploy.yarn.ClientBase$class.prepareLocalResources(ClientBase.scala:176)
    at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:35)
    at org.apache.spark.deploy.yarn.ClientBase$class.createContainerLaunchContext(ClientBase.scala:308)
    at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:35)
    at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:80)
    at org.apache.spark.deploy.yarn.ClientBase$class.run(ClientBase.scala:501)
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:35)
    at org.apache.spark.deploy.yarn.Client$.main(Client.scala:139)
    at org.apache.spark.deploy.yarn.Client.main(Client.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

【问题讨论】:

    标签: java scala hadoop apache-spark


    【解决方案1】:

    我一直在 EMR 上运行 spark 作业,但从未遇到此错误。您是使用 EMR 引导操作安装 spark 还是使用较新的 EMR 4.0 版本?

    无论哪种方式,您都应该尝试不设置 SPARK_JAR 环境变量。

    【讨论】:

      猜你喜欢
      • 2017-08-28
      • 2019-04-05
      • 2019-07-03
      • 2019-05-30
      • 2019-11-07
      • 2018-11-13
      • 1970-01-01
      • 2015-05-05
      • 1970-01-01
      相关资源
      最近更新 更多