第一个使用 map 和 reducer 的 Hadoop 程序答案

【问题标题】：First Hadoop program using map and reducer第一个使用 map 和 reducer 的 Hadoop 程序
【发布时间】：2016-03-13 19:39:53
【问题描述】：

我正在尝试编译我的第一个 Hadoop 程序。我有类似的输入文件：

1 54875451 2015 LA89LP
2 47451451 2015 LA89LP
3 878451 2015 LA89LP
4 54875 2015 LA89LP
5 2212 2015 LA89LP

当我编译它时，我得到 map 100%、reducer 0% 和 java.lang.Exception: java.util.NoSuchElementException 由很多员工引起，包括：

java.util.NoSuchElementException

java.util.StringTokenizer.nextToken(StringTokenizer.java:349)

我真的不明白为什么。任何帮助都非常感谢

我的 Map 和 Reducer 是这样的：

    public class Draft {

     public static class TokenizerMapper extends Mapper<Object, Text, Text, Text>{

     private Text word = new Text(); 
     private Text word2 = new Text();     

     public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {

       StringTokenizer itr = new StringTokenizer(value.toString());

       while (itr.hasMoreTokens()) {

       String id = itr.nextToken();
       String price = itr.nextToken();
       String dateTransfer = itr.nextToken();
       String postcode = itr.nextToken();

       word.set(postcode);
       word2.set(price);
       context.write(word, word2);
    }
  }
}

  public static class MaxReducer extends Reducer<Text,Text,Text,Text> {

    private Text word = new Text();
    private Text word2 = new Text();

    public void reduce(Text key, Iterable<Text> values, Context context
                       ) throws IOException, InterruptedException {
      String max = "0";
      HashSet<String> S = new HashSet<String>();

    for (Text val: values) {
        String d = key.toString();
        String price = val.toString(); 
        if (S.contains(d)) {
            if (Integer.parseInt(price)>Integer.parseInt(max)) max = price;
        } else {
            S.add(d);
            max = price;
        }
    }      

    word.set(key.toString());
    word2.set(max);
    context.write(word, word2);

    }
  }

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "Draft");
    job.setJarByClass(Draft.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setReducerClass(MaxReducer.class);
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(Text.class);
    job.setOutputKeyClass(Text.class); // output key type for mapper
    job.setOutputValueClass(Text.class); // output value type for mapper
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

【问题讨论】：

标签： hadoop dictionary nosuchelementexception reducers

【解决方案1】：

当您的某些记录的字段少于 4 个时，会发生此错误。您在映射器中的代码假定每条记录包含 4 个字段：id、price、dateTransfer 和 postcode。

但是，有些记录可能不包含全部 4 个字段。

例如如果记录是：

1 54875451 2015

那么，下面一行会抛出异常（java.util.NoSuchElementException）：

String postcode = itr.nextToken();

您正在尝试分配postcode（假定为第 4 个字段），但输入记录中只有 3 个字段。

要解决这个问题，您需要在map() 方法中更改您的字符串标记器代码。由于您只从map() 发出postcode 和price，您可以更改您的代码如下：

String[] tokens = value.toString().split(" ");

String price = "";
String postcode = "";

if(tokens.length >= 2)
    price = tokens[1];

if(tokens.length >= 4)
    postcode = tokens[3];

if(!price.isEmpty())
{
    word.set(postcode);
    word2.set(price);
    context.write(word, word2);
}

【讨论】：

谢谢，我明天试试，但即使是开头可以查看的简单txt文件也会显示错误，所以实际上没有任何少于4个字段的记录
当您尝试访问不存在的字段时，肯定会发生该异常 (NoSuchElementException)。
非常感谢。现在它似乎工作了:) 但我现在有了 java.lang.NumberFormatException: 对于这个字符串 4 54875 2015 LA89LP 如果我删除它，程序运行正常
可能是因为您试图将“LA89LP”转换为整数。