【问题标题】:How to pivot data in Hive with aggregation如何使用聚合在 Hive 中透视数据
【发布时间】:2019-02-10 23:33:03
【问题描述】:

我有一个像下面这样的表格数据,我想用聚合来透视数据。

ColumnA    ColumnB            ColumnC
1          complete            Yes
1          complete            Yes
2          In progress         No
2          In progress         No 
3          Not yet started     initiate 
3          Not yet started     initiate 

想像下面这样旋转

ColumnA          Complete    In progress     Not yet started
1                 2               0                0
2                 0               2                0
3                 0               0                2

我们是否可以在 hive 或 Impala 中实现这一点?

【问题讨论】:

  • 到目前为止你尝试了什么?

标签: apache-spark hadoop hive impala


【解决方案1】:

使用casesum 聚合:

select ColumnA,    
       sum(case when ColumnB='complete'        then 1 else 0 end) as Complete,
       sum(case when ColumnB='In progress'     then 1 else 0 end) as In_progress,
       sum(case when ColumnB='Not yet started' then 1 else 0 end) as Not_yet_started
  from table
 group by ColumnA
 order by ColumnA --remove if order is not necessary
;

【讨论】:

    【解决方案2】:

    这就是你可以在 spark scala 中做到这一点的方法。

         val conf = spark.sparkContext.hadoopConfiguration
            val test = spark.sparkContext.parallelize(List(  ("1", "Complete", "yes"),
                                            ("1", "Complete", "yes"),
                                            ("2", "Inprogress", "no"),
                                            ("2", "Inprogress", "no"),
                                           ("3", "Not yet started", "initiate"),
                                            ("3", "Not yet started", "initiate"))
    
    
                                            ).toDF("ColumnA","ColumnB","ColumnC")
          test.show()
           val test_pivot = test.groupBy("ColumnA")
                               .pivot("ColumnB")
                               .agg(count("columnC"))
    
      test_pivot.na.fill(0)show(false)
    
    
           }
    

    和输出

    |ColumnA|Complete|Inprogress|Not yet started|
    +-------+--------+----------+---------------+
    |3      |0       |0         |2              |
    |1      |2       |0         |0              |
    |2      |0       |2         |0              |
    +-------+--------+----------+---------------+
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2017-10-04
      • 1970-01-01
      • 1970-01-01
      • 2021-07-27
      • 1970-01-01
      • 1970-01-01
      • 2017-10-23
      相关资源
      最近更新 更多