无法从应用程序连接到独立集群答案

【问题标题】：Can't connect from application to the standalone cluster无法从应用程序连接到独立集群
【发布时间】：2014-11-01 20:02:44
【问题描述】：

我正在尝试从应用程序连接到 Spark 的独立集群。我想在一台机器上做到这一点。我通过命令运行独立的主服务器：

bash start-master.sh

然后我通过命令运行一名工人：

bash spark-class org.apache.spark.deploy.worker.Worker spark://PC:7077 -m 512m

（我为它分配了 512 MB）。

在主人的网页界面：

http://localhost:8080

我看到了，master 和 worker 正在运行。

然后我尝试使用以下命令从应用程序连接到集群：

JavaSparkContext sc = new JavaSparkContext("spark://PC:7077", "myapplication");

当我运行应用程序时，它会崩溃并显示以下错误消息：

4/11/01 22:53:26 INFO client.AppClient$ClientActor: Connecting to master spark://PC:7077...        
    14/11/01 22:53:26 INFO spark.SparkContext: Starting job: collect at App.java:115
    14/11/01 22:53:26 INFO scheduler.DAGScheduler: Got job 0 (collect at App.java:115)         with 2 output partitions (allowLocal=false)
    14/11/01 22:53:26 INFO scheduler.DAGScheduler: Final stage: Stage 0(collect at         App.java:115)
    14/11/01 22:53:26 INFO scheduler.DAGScheduler: Parents of final stage: List()
    14/11/01 22:53:26 INFO scheduler.DAGScheduler: Missing parents: List()
    14/11/01 22:53:26 INFO scheduler.DAGScheduler: Submitting Stage 0                 (ParallelCollectionRDD[0] at parallelize at App.java:109), which has no missing parents
    14/11/01 22:53:27 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from         Stage 0 (ParallelCollectionRDD[0] at parallelize at App.java:109)
    14/11/01 22:53:27 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
    14/11/01 22:53:42 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted         any resources; check your cluster UI to ensure that workers are         registered and have sufficient memory
    14/11/01 22:53:46 INFO client.AppClient$ClientActor: Connecting to master         spark://PC:7077...
    14/11/01 22:53:57 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted         any resources; check your cluster UI to ensure that workers are         registered and have sufficient memory
    14/11/01 22:54:06 INFO client.AppClient$ClientActor: Connecting to master         spark://PC:7077...
    14/11/01 22:54:12 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted         any resources; check your cluster UI to ensure that workers are         registered and have sufficient memory
    14/11/01 22:54:26 ERROR cluster.SparkDeploySchedulerBackend: Application has been         killed. Reason: All masters are unresponsive! Giving up.
    14/11/01 22:54:26 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose         tasks have all completed, from pool 
    14/11/01 22:54:26 INFO scheduler.DAGScheduler: Failed to run collect at         App.java:115
    Exception in thread "main" 14/11/01 22:54:26 INFO scheduler.TaskSchedulerImpl:         Cancelling stage 0
    org.apache.spark.SparkException: Job aborted due to stage failure: All masters are         unresponsive! Giving up.
        at         org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAnd        IndependentStages(DAGScheduler.scala:1033)
        at         org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017        )
        at         org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015        )
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015)
        at         org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.s        cala:633)
        at         org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.s        cala:633)
        at scala.Option.foreach(Option.scala:236)
        at         org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:633)
        at         org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAG        Scheduler.scala:1207)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
        at akka.actor.ActorCell.invoke(ActorCell.scala:456)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
        at akka.dispatch.Mailbox.run(Mailbox.scala:219)
        at         akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
    14/11/01 22:54:26 INFO handler.ContextHandler: stopped         o.e.j.s.ServletContextHandler{/metrics/json,null}
    14/11/01 22:54:26 INFO handler.ContextHandler: stopped         o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
    14/11/01 22:54:26 INFO handler.ContextHandler: stopped         o.e.j.s.ServletContextHandler{/,null}
    14/11/01 22:54:26 INFO handler.ContextHandler: stopped         o.e.j.s.ServletContextHandler{/static,null}
    14/11/01 22:54:26 INFO handler.ContextHandler: stopped         o.e.j.s.ServletContextHandler{/executors/json,null}
    14/11/01 22:54:26 INFO handler.ContextHandler: stopped         o.e.j.s.ServletContextHandler{/executors,null}
    14/11/01 22:54:26 INFO handler.ContextHandler: stopped         o.e.j.s.ServletContextHandler{/environment/json,null}

有什么想法吗？

附：我正在使用 Spark 的预构建版本 - spark-1.1.0-bin-hadoop2.4。

谢谢。

【问题讨论】：

标签： apache-spark

【解决方案1】：

确保独立工作器和 Spark 驱动程序都连接到 Spark 主服务器上的确切地址，该地址列在其 Web UI 中/打印在其启动日志消息中。 Spark 使用 Akka 进行一些控制平面通信，而 Akka 对主机名非常挑剔，因此需要完全匹配。

有几个选项可以控制驱动程序和主机将绑定到哪些主机名/网络接口。可能最简单的选项是设置SPARK_LOCAL_IP 环境变量来控制Master / Driver 将绑定到的地址。有关影响网络地址绑定的其他设置的概述，请参阅 http://databricks.gitbooks.io/databricks-spark-knowledge-base/content/troubleshooting/connectivity_issues.html。

【讨论】：

感谢人的快速响应！问题与这篇文章http://stackoverflow.com/questions/25682836/standalone-spark-cluster-cant-submit-job-programmatically-java-io-invalidcl?rq=1 非常相似，除了在 pom.xml 中我发现，一些 Spark 的依赖项有不同的版本（也许我在向 pom 添加另一个依赖项时不小心）。我修复了它们，然后我下载了预构建版本 - Spark - spark-1.1.0-bin-hadoop1，因为 pom 中指定了 hadoop 1.* 客户端。