【发布时间】:2015-03-04 14:54:09
【问题描述】:
我有一个 Spark 集群设置,其中包含一个 master 和 3 个 worker。 我使用 vagrant 和 Docker 来启动集群。
我正在尝试从本地 Eclipse 提交 Spark 工作,该工作将连接到主服务器,并允许我执行它。所以,这里是 Spark Conf:
SparkConf conf = new SparkConf().setAppName("Simple Application").setMaster("spark://scale1.docker:7077");
当我在 Master 的 UI 上从 Eclipse 运行我的应用程序时,我可以看到一个正在运行的应用程序。所有工作人员都处于活动状态,使用了 4 / 4 个内核,并为应用程序分配了 512 MB。
eclipse 控制台只会打印相同的警告:
15/03/04 15:39:27 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0
15/03/04 15:39:27 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:838
15/03/04 15:39:27 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[2] at mapToPair at CountLines.java:35)
15/03/04 15:39:27 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
15/03/04 15:39:42 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
15/03/04 15:39:57 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
15/03/04 15:40:04 INFO AppClient$ClientActor: Executor updated: app-20150304143926-0001/1 is now EXITED (Command exited with code 1)
15/03/04 15:40:04 INFO SparkDeploySchedulerBackend: Executor app-20150304143926-0001/1 removed: Command exited with code 1
15/03/04 15:40:04 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 1
15/03/04 15:40:04 INFO AppClient$ClientActor: Executor added: app-20150304143926-0001/2 on worker-20150304140319-scale3.docker-55425 (scale3.docker:55425) with 4 cores
15/03/04 15:40:04 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150304143926-0001/2 on hostPort scale3.docker:55425 with 4 cores, 512.0 MB RAM
15/03/04 15:40:04 INFO AppClient$ClientActor: Executor updated: app-20150304143926-0001/2 is now RUNNING
15/03/04 15:40:04 INFO AppClient$ClientActor: Executor updated: app-20150304143926-0001/2 is now LOADING
15/03/04 15:40:04 INFO AppClient$ClientActor: Executor updated: app-20150304143926-0001/0 is now EXITED (Command exited with code 1)
15/03/04 15:40:04 INFO SparkDeploySchedulerBackend: Executor app-20150304143926-0001/0 removed: Command exited with code 1
15/03/04 15:40:04 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 0
15/03/04 15:40:04 INFO AppClient$ClientActor: Executor added: app-20150304143926-0001/3 on worker-20150304140317-scale2.docker-60646 (scale2.docker:60646) with 4 cores
15/03/04 15:40:04 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150304143926-0001/3 on hostPort scale2.docker:60646 with 4 cores, 512.0 MB RAM
15/03/04 15:40:04 INFO AppClient$ClientActor: Executor updated: app-20150304143926-0001/3 is now RUNNING
15/03/04 15:40:04 INFO AppClient$ClientActor: Executor updated: app-20150304143926-0001/3 is now LOADING
阅读 Spark 的 Spark 文档我发现了这个:
因为驱动在集群上调度任务,所以应该运行 靠近工作节点,最好在同一个局域网上。 如果您想远程向集群发送请求,最好 向驱动程序打开一个 RPC 并让它从附近提交操作 而不是在远离工作节点的地方运行驱动程序。
我认为问题出在我机器上本地运行的驱动程序上。
我使用的是 Spark 1.2.0。
是否可以在 Eclipse 中运行应用程序并使用本地驱动程序将其提交到远程集群?如果是这样,我该怎么办?
【问题讨论】:
-
我认为问题在于您的 Vagrant/Docker 网络设置。当您启动 Driver 应用程序时,它会连接到 Master,Master 选择 Slaves,Slaves 连接回 Driver 应用程序以报告结果。因此,您的 Spark Master/Slaves 必须能够与 Driver 应用程序通信。检查您是否可以从容器中 ping 主机。您可以使用
spark.driver.port设置调整驱动程序应用端口
标签: java eclipse apache-spark