【问题标题】:Want to implement or read odd records using map reduce想要使用 map reduce 实现或读取奇数记录
【发布时间】:2016-06-23 20:47:51
【问题描述】:

我有一个用例,我必须使用 java map reduce 从文件文件中读取奇数行:

但是根据 Inputformat 类,它只读取 '\n' 作为行终止符。但我想阅读如下:

输入:
桑帕特
库马尔
Hadoop
地图还原

输出:
桑帕特
Hadoop

【问题讨论】:

标签: hadoop


【解决方案1】:

您也可以通过这种方式根据您的输入实现所需的输出:(无需编写自定义输入/输出格式)

输入:

sampat1 kumar2 hadoop3 mapredue4 sampat1 kumar2 hadoop3 mapredue4 sampat1 kumar2 hadoop3 mapredue4 sampat1 kumar2 hadoop3 mapredue4 sampat1 kumar2 hadoop3 mapredue4

输出:

sampat1 hadoop3 sampat1 hadoop3 sampat1 hadoop3 sampat1 hadoop3 sampat1 hadoop3 

代码:

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;


public class OddLine {

    public static class OddLineMapper extends Mapper<Object, Text, Text, Text> {

        private StringBuilder sb = new StringBuilder("");

        @Override
        public void map(Object key, Text value, Context context) throws IOException, InterruptedException {

            String[] lines = value.toString().split(" ");

            for(int i=0; i < lines.length; i+=2)
                sb.append(lines[i] + " ");

            context.write(new Text(""), new Text(sb.toString()));
        }
    }

    public static void main(String[] args) throws Exception {

        Configuration conf = new Configuration();

        Job job = Job.getInstance(conf, "Get odd words");
        job.setJarByClass(OddLine.class);
        job.setMapperClass(OddLineMapper.class);

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        FileSystem fs = null;
        Path dstFilePath = new Path(args[1]);
        try {
            fs = dstFilePath.getFileSystem(conf);
            if (fs.exists(dstFilePath))
                fs.delete(dstFilePath, true);
        } catch (IOException e1) {
            e1.printStackTrace();
        }
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2020-03-21
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-07-17
    • 2016-12-21
    • 1970-01-01
    • 2014-04-14
    相关资源
    最近更新 更多