【问题标题】:how to transpose row to column using Spark-SQL?如何使用 Spark-SQL 将行转换为列?
【发布时间】:2015-09-08 01:02:23
【问题描述】:

我的表 t1 中有以下数据

col1    | col2   |
sess-1  | read   |
sess-1  | meet   |
sess-1  | walk   |
sess-2  | watch  |
sess-2  | sleep  |
sess-2  | run    |
sess-2  | drive  |

预期输出:

col1   | col2                  |
sess-1 | read,meet,walk        |
sess-2 | watch,sleep,run,drive |

我使用的是 Spark 1.4.0

【问题讨论】:

    标签: scala apache-spark apache-spark-sql


    【解决方案1】:

    检查火花

    按键聚合

       scala> val babyNamesCSV = sc.parallelize(List(("David", 6), ("Abby", 4), ("David", 5), ("Abby", 5)))
    babyNamesCSV: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[0] at parallelize at <console>:12
    
    
    scala> babyNamesCSV.aggregateByKey(0)((k,v) => v.toInt+k, (v,k) => k+v).collect
    res1: Array[(String, Int)] = Array((Abby,9), (David,11))
    

    以上例子可以帮助理解

    或聚合 https://spark.apache.org/docs/0.6.0/api/core/spark/Aggregator.html

    【讨论】:

    • 感谢您的回复...我的要求略有不同,我发现通过做一些 RnD,我将其发布在下面
    【解决方案2】:
    // create RDD data
    scala> val data = sc.parallelize(List(("sess-1","read"), ("sess-1","meet"), 
        ("sess-1","walk"), ("sess-2","watch"),("sess-2","sleep"), 
        ("sess-2","run"),("sess-2","drive")))
    
    //groupByKey will return Iterable[String] CompactBuffer**
    scala> val dataCB = data.groupByKey()`
    
    //map CompactBuffer to List
    scala> val tx = dataCB.map{case (col1,col2)  => (col1,col2.toList)}.collect
    
    data: org.apache.spark.rdd.RDD[(String, String)] =
    ParallelCollectionRDD[211] at parallelize at <console>:26
    
    dataCB: org.apache.spark.rdd.RDD[(String, Iterable[String])] =
    ShuffledRDD[212] at groupByKey at <console>:30
    
    tx: Array[(String, List[String])] = Array((sess-1,List(read, meet,
    walk)), (sess-2,List(watch, sleep, run, drive)))
    
    //groupByKey and map to List can also achieved in one statment
    scala> val dataCB = data.groupByKey().map{case (col1,col2)  
        => (col1,col2.toList)}.collect
    

    【讨论】:

      猜你喜欢
      • 2022-01-05
      • 2015-12-05
      • 2019-02-20
      • 1970-01-01
      • 2021-12-06
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多