【发布时间】:2016-03-03 03:02:28
【问题描述】:
我正在编写一个映射器函数,它将键生成为一些 user_id,值也是文本类型。这是我的做法
public static class UserMapper extends Mapper<Object, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text userid = new Text();
private Text catid = new Text();
/* map method */
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString(), ","); /* separated by "," */
int count = 0;
userid.set(itr.nextToken());
while (itr.hasMoreTokens()) {
if (++count == 3) {
catid.set(itr.nextToken());
context.write(userid, catid);
}else {
itr.nextToken();
}
}
}
}
然后,在主程序中,我将映射器的输出类设置如下:
Job job = new Job(conf, "Customer Analyzer");
job.setJarByClass(popularCategories.class);
job.setMapperClass(UserMapper.class);
job.setCombinerClass(UserReducer.class);
job.setReducerClass(UserReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
所以即使我将输出值的类设置为Text.class,编译时仍然出现以下错误:
popularCategories.java:39: write(org.apache.hadoop.io.Text,org.apache.hadoop.io.IntWritable)
in org.apache.hadoop.mapreduce.TaskInputOutputContext<java.lang.Object,
org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,
org.apache.hadoop.io.IntWritable>
cannot be applied to (org.apache.hadoop.io.Text,org.apache.hadoop.io.Text)
context.write(userid, catid);
^
根据这个错误,还在考虑这种格式的mapper类:write(org.apache.hadoop.io.Text,org.apache.hadoop.io.IntWritable)
所以,当我如下更改类定义时,问题就解决了。
public static class UserMapper extends Mapper<Object, Text, Text, Text> {
}
所以,我想了解类定义和设置映射器输出值类有什么区别。
【问题讨论】:
标签: java apache hadoop types mapreduce