将数据框列转置为 PYSPARK 中的行答案

【问题标题】：Transpose dataframe cols to rows in PYSPARK将数据框列转置为 PYSPARK 中的行
【发布时间】：2019-10-24 18:15:05
【问题描述】：

我希望转置一个小数据框，以便将列变为行

例如假设我有一个这样的数据框

+---+---+------+
| id|obs|period|
+---+---+------+
|  1|230|  CURR|
|  2|456|  PREV|
+---+---+------+

I would like to have

+---------+-----+----+
|COL_NAME | CURR|PREV|
+---------+-----+----+
|id       |   1 | 2  |
|obs      |  230|456 |
+---------|-----|----+

非常感谢任何帮助。最接近我的是从网络上获得的

from pyspark.sql import functions as func
#Use `create_map` to create the map of columns with constant 
df = df.withColumn('mapCol', \
                    func.create_map(func.lit('period'),df.period,
                                    func.lit('col_2'),df.id,
                                    func.lit('col_3'),df.obs
                                   ) 
                  )
#Use explode function to explode the map 
res = df.select(func.explode(df.mapCol).alias('col_id','col_value'))
res.show()

+------+---------+
|col_id|col_value|
+------+---------+
|period|     CURR|
| col_2|        1|
| col_3|      230|
|period|     PREV|
| col_2|        2|
| col_3|      456|
+------+---------+

【问题讨论】：

How to pivot Spark DataFrame?的可能重复
我不想要一个聚合，所以它不是重复的
这是另一个链接：Pyspark: reshape data without aggregation - 在原始链接副本的底部有多个链接应该涵盖您的用例，例如：Transpose column to row with Spark、Spark: Transpose DataFrame Without Aggregating
另外，这个例子并不是 spark 的一个很好的用例（也许你只是编造的）。这可能是XY Problem 吗？你的最终目标是什么？可能有不同的（更好的）方法。例如 - 如果您的数据很小，也许您最好收集到 pandas 并在 spark 之外进行转置。

标签： python dataframe pyspark

【解决方案1】：

这是我想出的答案，感谢所有试图提供帮助的人。

spark.sql("select 'ID' as COL_NAME  ,max(case when period = 'CURR' then id end) as CURR, \
max(case when period = 'PREV' then id end) as PREV from df union \
select 'OBS' ,max(case when period = 'CURR' then obs end),max(case when period = 'PREV' then obs end) from df")\
.show()

+--------+----+----+
|COL_NAME|CURR|PREV|
+--------+----+----+
|   ID   |   1|   2|
|  OBS   | 230| 456|
+--------+----+----+

【讨论】：