【发布时间】:2021-01-18 14:35:00
【问题描述】:
我的工作环境主要使用PySpark,但是做一些谷歌搜索,在PySpark中转置非常复杂。我想将它保留在 PySpark 中,但如果在 Pandas 中更容易做到,我会将 Spark 数据帧转换为 Pandas 数据帧。在我认为性能是一个问题的地方,数据集并没有那么大。
我想将具有多列的数据框转换为行:
输入:
import pandas as pd
df = pd.DataFrame({'Record': {0: 1, 1: 2, 2: 3},
'Hospital': {0: 'Red Cross', 1: 'Alberta Hospital', 2: 'General Hospital'},
'Hospital Address': {0: '1234 Street 429',
1: '553 Alberta Road 441',
2: '994 Random Street 923'},
'Medicine_1': {0: 'Effective', 1: 'Effecive', 2: 'Normal'},
'Medicine_2': {0: 'Effective', 1: 'Normal', 2: 'Effective'},
'Medicine_3': {0: 'Normal', 1: 'Normal', 2: 'Normal'},
'Medicine_4': {0: 'Effective', 1: 'Effective', 2: 'Effective'}})
Record Hospital Hospital Address Medicine_1 Medicine_2 Medicine_3 Medicine_4
1 Red Cross 1234 Street 429 Effective Effective Normal Effective
2 Alberta Hospital 553 Alberta Road 441 Effecive Normal Normal Effective
3 General Hospital 994 Random Street 923 Normal Effective Normal Effective
输出:
Record Hospital Hospital Address Name Value
0 1 Red Cross 1234 Street 429 Medicine_1 Effective
1 2 Red Cross 1234 Street 429 Medicine_2 Effective
2 3 Red Cross 1234 Street 429 Medicine_3 Normal
3 4 Red Cross 1234 Street 429 Medicine_4 Effective
4 5 Alberta Hospital 553 Alberta Road 441 Medicine_1 Effecive
5 6 Alberta Hospital 553 Alberta Road 441 Medicine_2 Normal
6 7 Alberta Hospital 553 Alberta Road 441 Medicine_3 Normal
7 8 Alberta Hospital 553 Alberta Road 441 Medicine_4 Effective
8 9 General Hospital 994 Random Street 923 Medicine_1 Normal
9 10 General Hospital 994 Random Street 923 Medicine_2 Effective
10 11 General Hospital 994 Random Street 923 Medicine_3 Normal
11 12 General Hospital 994 Random Street 923 Medicine_4 Effective
查看 PySpark 示例,很复杂:PySpark Dataframe melt columns into rows
看看 Pandas 的例子,它看起来要容易得多。但是有许多不同的 Stack Overflow 答案,有些人说要使用 pivot、melt、stack、unstack,而且最终会让人感到困惑。
所以,如果有人在 PySpark 中有一个简单的方法来做到这一点,我会全力以赴。如果没有,我会很乐意接受 Pandas 的答案。
非常感谢您的帮助!
【问题讨论】:
-
IIUC,你可以在 pyspark 中使用
explode参见stackoverflow.com/questions/55378047/… -
您好,我已经删除了图片并编辑了您的问题,但以后请参阅:stackoverflow.com/questions/20109391/…
标签: pandas pyspark pivot transform melt