【问题标题】:How to iterate through spark cogroup values如何遍历 spark cogroup 值
【发布时间】:2016-08-12 07:15:12
【问题描述】:
1) for Categories

twitter handle , categories , sub_categories 

handle        ,  Products ,    MakeUp 
handle        ,  Health,     MakeUp
handle2        , Services ,     Face
handle3         , Marketing ,    Soap

JavaPairRDD<String ,Category> categoryPairRDD

2) For Twitter 

Twitter handle , twitter_post , twitter_likes 

 handle                "Iphone"              , 10 
 handle2               "Samsung"                 ,20


JavaPairRDD<String ,Twitter>  twitterPairRDD


JavaPairRDD<String, Tuple2<Iterable<Ontologies>, Iterable<Twitter>>> grouped = categoryPairRDD
           .cogroup(twitterPairRDD);

我应该如何迭代 cogroup 值,以便 If for a Key if the object is found 打印值,否则 打印空值

即在我的 categoryPairRDD 中,handle3 存在,但它在 twitterRDD 中不存在,因此键 handle3 的输出应该是

handle3 , Marketing , Soap , null , null

最终输出应该是

handle , Products , Makeup  , Iphone , 10
handle , Health , Makeup ,  , Iphone, 10 
handle2 , Services , Face , Samsung , 20
handle3  , Marketing, Soap ,  null , null

【问题讨论】:

    标签: java apache-spark rdd


    【解决方案1】:

    设法得到解决方案

    JavaPairRDD<String, Tuple2<Ontologies, Optional<twitterPairRDD>>> left =  ontologiesPair.leftOuterJoin(twitterPairRDD);
    
        left.foreach(new VoidFunction<Tuple2<String,Tuple2<Ontologies,Optional<Twitter>>>>() {
    
            @Override
            public void call(Tuple2<String, Tuple2<Ontologies, Optional<Instagram>>> arg0) throws Exception {
                try{
                     Optional<Twitter> tweet = arg0._2._2();
                     //print values from tuple ie arg0._2._1() and tweet    object      
                  }   
                   catch(Exception e){
                    Twitter tweet = new Twitter("",-1);
                   //Print values from arg0._2._1() and empty tweet object
                }
    

    但我仍然想知道使用 co-group 的任何答案

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2014-10-26
      • 1970-01-01
      • 1970-01-01
      • 2021-11-05
      • 2020-12-10
      • 1970-01-01
      相关资源
      最近更新 更多