【问题标题】:hadoop + Writable interface + readFields throws an exception in reducerhadoop + Writable interface + readFields 在reducer中抛出异常
【发布时间】:2011-04-14 09:06:11
【问题描述】:

我有一个简单的 map-reduce 程序,其中我的 map 和 reduce 基元看起来像这样

map(K,V) = (Text, OutputAggregator)
reduce(Text, OutputAggregator) = (Text,Text)

重要的一点是,从我的 map 函数中,我发出了一个 OutputAggregator 类型的对象,它是我自己的实现 Writable 接口的类。但是,我的 reduce 失败,但出现以下异常。更具体地说,readFieds() 函数正在引发异常。任何线索为什么?我使用 hadoop 0.18.3

10/09/19 04:04:59 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
10/09/19 04:04:59 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
10/09/19 04:04:59 INFO mapred.FileInputFormat: Total input paths to process : 1
10/09/19 04:04:59 INFO mapred.FileInputFormat: Total input paths to process : 1
10/09/19 04:04:59 INFO mapred.FileInputFormat: Total input paths to process : 1
10/09/19 04:04:59 INFO mapred.FileInputFormat: Total input paths to process : 1
10/09/19 04:04:59 INFO mapred.JobClient: Running job: job_local_0001
10/09/19 04:04:59 INFO mapred.MapTask: numReduceTasks: 1
10/09/19 04:04:59 INFO mapred.MapTask: io.sort.mb = 100
10/09/19 04:04:59 INFO mapred.MapTask: data buffer = 79691776/99614720
10/09/19 04:04:59 INFO mapred.MapTask: record buffer = 262144/327680
Length = 10
10
10/09/19 04:04:59 INFO mapred.MapTask: Starting flush of map output
10/09/19 04:04:59 INFO mapred.MapTask: bufstart = 0; bufend = 231; bufvoid = 99614720
10/09/19 04:04:59 INFO mapred.MapTask: kvstart = 0; kvend = 10; length = 327680
gl_books
10/09/19 04:04:59 WARN mapred.LocalJobRunner: job_local_0001
java.lang.NullPointerException
 at org.myorg.OutputAggregator.readFields(OutputAggregator.java:46)
 at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
 at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
 at org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:751)
 at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:691)
 at org.apache.hadoop.mapred.Task$CombineValuesIterator.next(Task.java:770)
 at org.myorg.xxxParallelizer$Reduce.reduce(xxxParallelizer.java:117)
 at org.myorg.xxxParallelizer$Reduce.reduce(xxxParallelizer.java:1)
 at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:904)
 at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:785)
 at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:228)
 at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:157)
java.io.IOException: Job failed!
 at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1113)
 at org.myorg.xxxParallelizer.main(xxxParallelizer.java:145)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
 at java.lang.reflect.Method.invoke(Unknown Source)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
 at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
 at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)

【问题讨论】:

  • 发布 OutputAggregator.readFields() 的代码。第 46 行是什么?

标签: hadoop writable


【解决方案1】:

除了 Niels Basjes 的回答:只需在空构造函数中初始化您的成员变量(您必须提供,否则 Hadoop 无法初始化您的对象),例如:

public OutputAggregator() {
    this.member = new IntWritable();
    ...
}

假设this.member 的类型为IntWritable

【讨论】:

    【解决方案2】:

    发布有关自定义代码的问题时:发布相关的代码。所以第 46 行的内容和之前和之后的几行真的很有帮助......:)

    但是这可能会有所帮助:

    THE 编写自己的 Writable Class 时的缺陷是 Hadoop 一遍又一遍地重用该类的实际实例。在对 readFields 的调用之间,您不会得到一个闪亮的新实例。

    因此,在 readFields 方法开始时,您必须假设您所在的对象充满了“垃圾”,并且必须在继续之前将其清除。

    我对您的建议是实现一个“clear()”方法,该方法完全擦除当前实例并将其重置为它在创建和构造函数完成后的状态。当然,您在 readFields 中首先将该方法称为键和值。

    HTH

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2018-09-07
      • 1970-01-01
      • 2012-01-24
      • 2013-05-24
      • 1970-01-01
      • 2014-10-04
      相关资源
      最近更新 更多