【发布时间】:2016-02-09 16:14:10
【问题描述】:
我正在 Apache spark 中做 WordCount 的一个简单示例,现在我终于得到了字数计数我只想从中过滤唯一字。
public class SparkClass {
public static void main(String[] args) {
String file = "/home/bhaumik/Documents/my";
JavaSparkContext sc = new JavaSparkContext("local", "SimpleApp");
JavaRDD<String> lines = sc.textFile("/home/bhaumik/Documents/myText", 5)
.flatMap(new FlatMapFunction<String, String>() {
@Override
public Iterable<String> call(String t) throws Exception {
// TODO Auto-generated method stub
return Arrays.asList(t.split(" "));
}
});
JavaPairRDD<String, Integer> pairs = lines.mapToPair(new PairFunction<String, String, Integer>() {
@Override
public Tuple2<String, Integer> call(String t) throws Exception {
// TODO Auto-generated method stub
return new Tuple2<String, Integer>(t, 1);
}
});
JavaPairRDD<String, Integer> counts = pairs.reduceByKey(new Function2<Integer, Integer, Integer>() {
@Override
public Integer call(Integer v1, Integer v2) throws Exception {
// TODO Auto-generated method stub
return v1 + v2;
}
});
}
}
【问题讨论】:
标签: java apache-spark