【问题标题】:HBase master won't start, Can't connect to hbase.rootdirHBase master 无法启动,无法连接到 hbase.rootdir
【发布时间】:2013-08-06 13:39:22
【问题描述】:

我正在尝试根据 apache 网站上的设置以伪分布式模式运行 HBase,但我无法正确配置 hbase.root 目录。

这是我的配置文件的样子:

在 Hadoop 目录中:

conf/core-site.xml:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
  </property>

  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>

  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:9001</value>
  </property>
</configuration>

conf/hdfs-site.xml:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>

  <property>
    <name>dfs.support.append</name>
    <value>true</value>
  </property>

  <property>
    <name>dfs.datanode.max.xcievers</name>
    <value>4096</value>
  </property>

</configuration>

conf/mapred-site.xml:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:9001</value>
  </property>
</configuration>

在我的 HBase 目录中

hbase-site.xml:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
/**
 * Copyright 2010 The Apache Software Foundation
 *
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
-->
<configuration>

  <property>
    <name>dfs.support.append</name>
    <value>true</value>
  </property>

  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://localhost:9000/hbase</value>
  </property>

  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>

  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>localhost</value>
  </property>

</configuration>

当我运行 start-hbase.sh 脚本时,它说它启动了 zookeeper、hbase 主服务器和区域服务器,我能够登录到它们。然后我可以访问 hbase shell,但我不能创建表或任何东西。我尝试使用我的网络浏览器连接到主状态 ui,但它无法连接。起初我以为是因为我在亚马逊实例上运行它,并且端口 9000 没有被授予权限,但我发现它是。端口 50030 和 50070 被授予相同的权限,我可以从它们访问作业跟踪器和名称节点。我检查了日志,发现了这个错误:

2013-08-05 18:00:35,613 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-08-05 18:00:35,616 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown.
java.net.ConnectException: Call to localhost/127.0.0.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:1136)
    at org.apache.hadoop.ipc.Client.call(Client.java:1112)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
    at com.sun.proxy.$Proxy10.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:411)
    at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:135)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:276)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:241)
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:100)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1411)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1429)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
    at org.apache.hadoop.hbase.util.FSUtils.getRootDir(FSUtils.java:667)
    at org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:112)
    at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:560)
    at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:419)
    at java.lang.Thread.run(Thread.java:724)
Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:511)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:481)
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:453)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:579)
    at org.apache.hadoop.ipc.Client$Connection.access$2100(Client.java:202)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1243)
    at org.apache.hadoop.ipc.Client.call(Client.java:1087)
    ... 17 more

如您所见,它正在尝试访问 localhost/127.0.0.1:9000,这显然是错误的。:

由于连接异常调用 localhost/127.0.0.1:9000 失败

这是我的 /etc/hosts 文件的样子:

127.0.0.1 localhost

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

还有: 用实例的公有 DNS 替换 localhost 也不起作用

【问题讨论】:

  • Hadoop 和 HBase 版本?为什么您认为尝试连接到 localhost 是错误的?
  • hadoop: 1.1.2 和 hbase 0.94.10,根据他们网站上的图表是兼容的:hbase.apache.org/book/configuration.html#hadoop,我并不是说连接到 localhost 是错误的,而是 localhost/127.0 .0.1 不会像连接到 www.stackoverflow/stackoverflow.com 一样吗?
  • 但你想这样做。我可以在您的配置文件中看到 localhost。如果您不想使用它,那么您需要在配置文件中使用正确的主机名。

标签: apache hadoop amazon-ec2 localhost hbase


【解决方案1】:

先提几点建议。您实际上不需要将 dfs.replicationma​​pred.job.tracker 放入 core-site.xml 和 dfs.support.append在 hbase-site.xml 文件中。这不是必需的。

请确保 NN 运行良好且已退出安全模式。另外,最好关闭 IPv6 并在 hbase-site.xml 文件中添加 hbase.zookeeper.property.dataDirhbase.zookeeper.property.clientPort 并设置导出 HBASE_MANAGES_ZK 为 true。修改配置文件后重启 HBase。

【讨论】:

  • namenode 正在运行,我不确定如何检查它是否处于安全模式。我应该为 hbase.zookeeper.property.dataDirhbase.zookeeper.property.clientPort 设置什么值?
  • 您可以通过 HDFS web ui 或通过 HDFS shell 使用 bin/hadoop dfsadmin -safemode enter 进行检查。在某个方便的位置创建一个目录,比如你的家,并将其用于 dataDir,并将 2181 用于 clientPort。
  • 道歉。使用这个:bin/hadoop dfsadmin -report。请忽略 bin/hadoop dfsadmin -safemode enter。该命令用于进入安全模式。如果安全模式打开,请使用 bin/hadoop dfsadmin -safemode leave。
  • 好的,那行得通,显然我已经更改了hadoop的配置并忘记重新格式化namenode。谢谢
  • 不客气。作为旁注,最好在 hdfs-site.xml 中包含 dfs.data.dirdfs.name.dir。这些属性默认为 /tmp 目录,该目录在每次重新启动时都会被清空。结果,您将丢失所有数据和元数据,导致每次重新启动机器时重新格式化。您可能想访问这些链接 cloudfront.blogspot.in/2012/07/… ... cloudfront.blogspot.in/2012/06/… ,以防您需要帮助。我试图详细解释这一切。 HTH
猜你喜欢
  • 1970-01-01
  • 2014-04-14
  • 1970-01-01
  • 2012-09-18
  • 1970-01-01
  • 1970-01-01
  • 2014-12-04
  • 2016-11-24
  • 1970-01-01
相关资源
最近更新 更多