【问题标题】:Implementing the Hadoop and MongoDB connector实现 Hadoop 和 MongoDB 连接器
【发布时间】:2014-03-11 18:07:40
【问题描述】:

我是第一次使用 Hadoop,因为我计划将它与 MongoDB 一起使用。安装 Hadoop 后,我尝试遵循本教程并实现其示例 http://docs.mongodb.org/ecosystem/tutorial/getting-started-with-hadoop/

在我调用这个命令之前一切正常

bash examples/treasury_yield/run_job.sh

然后我收到以下消息

14/03/11 17:52:45 INFO util.MongoTool: Created a conf: 'Configuration: core-defa
ult.xml, core-site.xml, src/examples/hadoop-local.xml, src/examples/mongo-defaul
ts.xml' on {class com.mongodb.hadoop.examples.treasury.TreasuryYieldXMLConfig} a
s job named '<unnamed MongoTool job>'
14/03/11 17:52:46 INFO util.MongoTool: Mapper Class: class com.mongodb.hadoop.ex
amples.treasury.TreasuryYieldMapper
14/03/11 17:52:46 INFO util.MongoTool: Setting up and running MapReduce job in f
oreground, will wait for results.  {Verbose? true}
14/03/11 17:52:47 WARN fs.FileSystem: "localhost:9100" is a deprecated filesyste
m name. Use "hdfs://localhost:9100/" instead.
14/03/11 17:52:47 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop
.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-goncalopereira/mapre
d/staging/goncalopereira/.staging/job_201403111752_0001/job.jar could only be re
plicated to 0 nodes, instead of 1
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBloc
k(FSNamesystem.java:1639)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.jav
a:736)
        at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma
tion.java:1149)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)

        at org.apache.hadoop.ipc.Client.call(Client.java:1107)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
        at com.sun.proxy.$Proxy2.addBlock(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryI
nvocationHandler.java:85)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocat
ionHandler.java:62)
        at com.sun.proxy.$Proxy2.addBlock(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock
(DFSClient.java:3686)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStrea
m(DFSClient.java:3546)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClien
t.java:2749)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFS
Client.java:2989)

14/03/11 17:52:47 WARN hdfs.DFSClient: Error Recovery for block null bad datanod
e[0] nodes == null
14/03/11 17:52:47 WARN hdfs.DFSClient: Could not get block locations. Source fil
e "/tmp/hadoop-goncalopereira/mapred/staging/goncalopereira/.staging/job_2014031
11752_0001/job.jar" - Aborting...
14/03/11 17:52:47 INFO mapred.JobClient: Cleaning up the staging area hdfs://loc
alhost:9100/tmp/hadoop-goncalopereira/mapred/staging/goncalopereira/.staging/job
_201403111752_0001
14/03/11 17:52:47 ERROR security.UserGroupInformation: PriviledgedActionExceptio
n as:goncalopereira cause:org.apache.hadoop.ipc.RemoteException: java.io.IOExcep
tion: File /tmp/hadoop-goncalopereira/mapred/staging/goncalopereira/.staging/job
_201403111752_0001/job.jar could only be replicated to 0 nodes, instead of 1
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBloc
k(FSNamesystem.java:1639)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.jav
a:736)
        at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma
tion.java:1149)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)

14/03/11 17:52:47 ERROR util.MongoTool: Exception while executing job...
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-gon
calopereira/mapred/staging/goncalopereira/.staging/job_201403111752_0001/job.jar
 could only be replicated to 0 nodes, instead of 1
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBloc
k(FSNamesystem.java:1639)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.jav
a:736)
        at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma
tion.java:1149)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)

        at org.apache.hadoop.ipc.Client.call(Client.java:1107)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
        at com.sun.proxy.$Proxy2.addBlock(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryI
nvocationHandler.java:85)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocat
ionHandler.java:62)
        at com.sun.proxy.$Proxy2.addBlock(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock
(DFSClient.java:3686)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStrea
m(DFSClient.java:3546)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClien
t.java:2749)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFS
Client.java:2989)
14/03/11 17:52:47 ERROR hdfs.DFSClient: Failed to close file /tmp/hadoop-goncalo
pereira/mapred/staging/goncalopereira/.staging/job_201403111752_0001/job.jar
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-gon
calopereira/mapred/staging/goncalopereira/.staging/job_201403111752_0001/job.jar
 could only be replicated to 0 nodes, instead of 1
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBloc
k(FSNamesystem.java:1639)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.jav
a:736)
        at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma
tion.java:1149)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)

        at org.apache.hadoop.ipc.Client.call(Client.java:1107)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
        at com.sun.proxy.$Proxy2.addBlock(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryI
nvocationHandler.java:85)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocat
ionHandler.java:62)
        at com.sun.proxy.$Proxy2.addBlock(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock
(DFSClient.java:3686)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStrea
m(DFSClient.java:3546)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClien
t.java:2749)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFS
Client.java:2989)

您可以猜到,这对于像我这样的新手来说有点压倒性。我认为这是 Hadoop 的一些问题,但不完全确定是什么。我真希望这里有人能指出我正确的方向。

【问题讨论】:

  • 您可以尝试使用以下命令将文件放入 HDFS:hadoop fs -put local-file hdfs-path。如果这给出了错误,那么 HDFS 设置就会出现问题。还要检查您的 NameNode Web UI 以查看是否列出了数据节点。您还可以运行 hadoop -fsck 来检查文件系统的健康状况。
  • 我猜datanodes没有启动并运行,当你运行这个作业时jps命令的输出是什么?你在列表中看到数据节点了吗?
  • 这个问题似乎与我没有 livenode 有关。但这令人困惑,因为我正确执行了所有步骤并且没有收到任何消息,我试图重新启动集群并修改配置文件但没有任何反应。

标签: mongodb hadoop


【解决方案1】:

您好,我已使用 mongoDBConnector 使用此链接将 hadoop 与 mongodb 连接

hadoop connection with mongodb

【讨论】:

    【解决方案2】:

    你需要专注于这个错误:

    ERROR security.UserGroupInformation: PriviledgedActionExceptio n as:goncalopereira cause:org.apache.hadoop.ipc.RemoteException: java.io.IOExcep tion: File /tmp/hadoop-goncalopereira/mapred/staging/goncalopereira/.staging/job _201403111752_0001/job.jar could only be replicated to 0 nodes, instead of 1

    1. 检查路径上是否存在该 jar。

    2. 检查你是否正在启动你的d​​ataNode,因为它需要时间来启动。

    确保您的 hadoop 已正确安装,并尝试为 hadoop 运行示例数据集,而无需将 MangoDB 引入图片中。这将区分哪里出了问题。希望对您有所帮助。

    【讨论】:

    • 根据网页界面,我有 0 个活动节点。尝试再次格式化 HDFS 并重新启动集群,但同样的事情发生了。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-01-15
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多