【发布时间】:2016-08-12 07:15:12
【问题描述】:
1) for Categories
twitter handle , categories , sub_categories
handle , Products , MakeUp
handle , Health, MakeUp
handle2 , Services , Face
handle3 , Marketing , Soap
JavaPairRDD<String ,Category> categoryPairRDD
2) For Twitter
Twitter handle , twitter_post , twitter_likes
handle "Iphone" , 10
handle2 "Samsung" ,20
JavaPairRDD<String ,Twitter> twitterPairRDD
JavaPairRDD<String, Tuple2<Iterable<Ontologies>, Iterable<Twitter>>> grouped = categoryPairRDD
.cogroup(twitterPairRDD);
我应该如何迭代 cogroup 值,以便 If for a Key if the object is found 打印值,否则 打印空值
即在我的 categoryPairRDD 中,handle3 存在,但它在 twitterRDD 中不存在,因此键 handle3 的输出应该是
handle3 , Marketing , Soap , null , null
最终输出应该是
handle , Products , Makeup , Iphone , 10
handle , Health , Makeup , , Iphone, 10
handle2 , Services , Face , Samsung , 20
handle3 , Marketing, Soap , null , null
【问题讨论】:
标签: java apache-spark rdd