【问题标题】:Spark cluster Master IP address not binding to floating IPSpark 集群主 IP 地址未绑定到浮动 IP
【发布时间】:2016-09-08 12:36:11
【问题描述】:

我正在尝试使用 OpenStack 配置 Spark 集群。目前我有两台名为

的服务器
  • spark-master(IP:192.x.x.1,浮动IP:87.x.x.1)
  • spark-slave-1(IP:192.x.x.2,浮动 IP:87.x.x.2)

我在尝试使用这些浮动 IP 与标准公共 IP 时遇到了问题。

在 spark-master 机器上,主机名是 spark-master 并且 /etc/hosts 看起来像

127.0.0.1 localhost
127.0.1.1 spark-master

spark-env.sh 所做的唯一更改是 export SPARK_MASTER_IP='192.x.x.1'。如果我运行./sbin/start-master.sh,我可以查看 Web UI。

问题是我使用浮动 IP 87.x.x.1 查看 Web UI,其中列出了主 URL:spark://192.x.x.1:7077。

从奴隶我可以运行./sbin/start-slave.sh spark://192.x.x.1:7077,它连接成功。

如果我尝试通过将主服务器上的 spark-env.sh 更改为 export SPARK_MASTER_IP='87.x.x.1' 来使用浮动 IP,则会收到以下错误日志

Spark Command: /usr/lib/jvm/java-7-openjdk-amd64/bin/java -cp /usr/local/spark-1.6.1-bin-hadoop2.6/conf/:/usr/local/spark-1.6.1-bin-hadoop2.6/lib/spark-assembly-1.6.1-hadoop2.6.0.jar:/usr/local/spark-1.6.1-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/usr/local/spark-1.6.1-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/usr/local/spark-1.6.1-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar -Xms1g -Xmx1g -XX:MaxPermSize=256m org.apache.spark.deploy.master.Master --ip 87.x.x.1 --port 7077 --webui-port 8080
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/05/12 15:05:33 INFO Master: Registered signal handlers for [TERM, HUP, INT]
16/05/12 15:05:33 WARN Utils: Your hostname, spark-master resolves to a loopback address: 127.0.1.1; using 192.x.x.1 instead (on interface eth0)
16/05/12 15:05:33 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/05/12 15:05:33 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/05/12 15:05:33 INFO SecurityManager: Changing view acls to: ubuntu
16/05/12 15:05:33 INFO SecurityManager: Changing modify acls to: ubuntu
16/05/12 15:05:33 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ubuntu); users with modify permissions: Set(ubuntu)
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7077. Attempting port 7078.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7078. Attempting port 7079.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7079. Attempting port 7080.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7080. Attempting port 7081.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7081. Attempting port 7082.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7082. Attempting port 7083.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7083. Attempting port 7084.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7084. Attempting port 7085.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7085. Attempting port 7086.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7086. Attempting port 7087.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7087. Attempting port 7088.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7088. Attempting port 7089.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7089. Attempting port 7090.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7090. Attempting port 7091.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7091. Attempting port 7092.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7092. Attempting port 7093.
Exception in thread "main" java.net.BindException: Cannot assign requested address: Service 'sparkMaster' failed after 16 retries!
  at sun.nio.ch.Net.bind0(Native Method)
  at sun.nio.ch.Net.bind(Net.java:463)
  at sun.nio.ch.Net.bind(Net.java:455)
  at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
  at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
  at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:125)
  at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:485)
  at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1089)
  at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:430)
  at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:415)
  at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:903)
  at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:198)
  at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:348)
  at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
  at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
  at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
  at java.lang.Thread.run(Thread.java:745)

显然,对我来说,这里的外卖是这条线

您的主机名 spark-master 解析为环回地址:127.0.1.1; 使用 192.x.x.1 代替(在接口 eth0 上) 16/05/12 15:05:33 WARN 实用工具:如果需要绑定到另一个地址,请设置 SPARK_LOCAL_IP

但无论我尝试采用何种方法,我都会遇到更多错误。

如果我同时设置export SPARK_MASTER_IP='87.x.x.1'export SPARK_LOCAL_IP='87.x.x.1' 并尝试./sbin/start-master.sh,我会收到以下错误日志

16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7077. Attempting port 7078.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7078. Attempting port 7079.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7079. Attempting port 7080.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7080. Attempting port 7081.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7081. Attempting port 7082.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7082. Attempting port 7083.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7083. Attempting port 7084.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7084. Attempting port 7085.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7085. Attempting port 7086.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7086. Attempting port 7087.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7087. Attempting port 7088.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7088. Attempting port 7089.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7089. Attempting port 7090.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7090. Attempting port 7091.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7091. Attempting port 7092.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7092. Attempting port 7093.
Exception in thread "main" java.net.BindException: Cannot assign requested address: Service 'sparkMaster' failed after 16 retries!

这个,尽管我的安全组看起来是正确的

ALLOW IPv4 443/tcp from 0.0.0.0/0
ALLOW IPv4 80/tcp from 0.0.0.0/0
ALLOW IPv4 8081/tcp from 0.0.0.0/0
ALLOW IPv4 8080/tcp from 0.0.0.0/0
ALLOW IPv4 18080/tcp from 0.0.0.0/0
ALLOW IPv4 7077/tcp from 0.0.0.0/0
ALLOW IPv4 4040/tcp from 0.0.0.0/0
ALLOW IPv4 to 0.0.0.0/0
ALLOW IPv6 to ::/0
ALLOW IPv4 22/tcp from 0.0.0.0/0

【问题讨论】:

  • 你能解决你的问题吗?我可以从共享相同专用网络的计算机创建一个集群,但是当我尝试做类似的事情(将公共 IP 分配给节点)时,它不起作用。链接到我的问题:stackoverflow.com/questions/48020657/…

标签: apache-spark network-programming ip-address openstack


【解决方案1】:

如日志中所示,

您的主机名 spark-master 解析为环回地址:127.0.1.1;使用 192.x.x.1 代替(在接口 eth0 上)

Spark自动尝试获取主机IP,它使用另一个IP192.x.x.1而不是浮动IP87.x.x.1

要解决这个问题,你应该设置SPARK_LOCAL_IP=87.x.x.1(最好在 spark-env.sh 中)并重新启动你的 master

【讨论】:

  • 那么,如果我设置SPARK_LOCAL_IP=87.x.x.1,我是否也在同一个spark-env.sh中设置SPARK_MASTER_IP=87.x.x.1
  • 是的,我就是这个意思
  • 能否强制 Spark 使用 IPV4?在 spark-env.sh 添加以下行:export SPARK_DAEMON_JAVA_OPTS="-Djava.net.preferIPv4Stack=true"
  • 如果我将 SPARK_LOCAL_IPSPARK_MASTER_IP 设置为 87.x.x.1 那么我会得到与上面描述的相同的错误。那是 /etc/hosts/ 有127.0.1.1 spark-master
  • 按照你的建议强制 IPV4 对我没有任何改变
【解决方案2】:

我自己在 Openstack 上设置了一个 spark 集群(独立集群),并在我的主服务器上的 /etc/hosts 文件中,我有:

127.0.0.1 本地主机

192.168.1.2 spark-master 而不是 127.0.0.1

现在,由于我的主服务器和从服务器都有一个虚拟专用网络,因此我只使用专用 IP。我唯一一次使用浮动 IP 是在我启动 spark-submit --master spark://spark-master 时在我的主机上(这里的 spark-master 解析为浮动 IP)。我认为您不需要尝试绑定浮动IP。我希望这会有所帮助!

布鲁诺

【讨论】:

  • 你在 conf 文件中将 SPARK_LOCAL_IPSPARK_MASTER_IP 设置为什么?
  • 我没有设置 SPARK_LOCAL_IP 但在我的 conf 文件中,在 spark.master 我有:spark://spark-master:7077(spark-master 是 192.168.1.2),希望它有所帮助!
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2010-09-14
  • 2019-04-24
  • 1970-01-01
  • 2012-03-14
  • 2018-05-05
相关资源
最近更新 更多