【发布时间】:2021-06-02 15:00:49
【问题描述】:
我有一个关于将数据框列中的列表拆分为多列的问题。但是每个被拆分的值都需要放在特定的列中。
假设我有这个数据框:
date data
2020-01-01 00:00:00 [G07, G08, G10, G16]
2020-01-01 00:00:01 [G07, G08, G16]
2020-01-01 00:00:02 [G08, G10, G16, G20, G21]
2020-01-01 00:00:03 [G16, G20, G21, G26, G27, R02]
2020-01-01 00:00:04 [G07, G08, G26, G27]
我正在寻找这种结果:
date G07 G08 G10 G16 G20 G21 G26 G27 R02
2020-01-01 00:00:00 G07 G08 G10 G16 NaN NaN NaN NaN NaN
2020-01-01 00:00:01 G07 G08 NaN G16 NaN NaN NaN NaN NaN
2020-01-01 00:00:02 NaN G08 G10 G16 G20 G21 NaN NaN NaN
2020-01-01 00:00:03 NaN NaN NaN G16 G20 G21 G26 G27 R02
2020-01-01 00:00:04 G07 G08 NaN NaN NaN NaN G26 G27 NaN
要最终得到这种矩阵:
date G07 G08 G10 G16 G20 G21 G26 G27 R02
2020-01-01 00:00:00 1 1 1 1 0 0 0 0 0
2020-01-01 00:00:01 1 1 0 1 0 0 0 0 0
2020-01-01 00:00:02 0 1 1 1 1 1 0 0 0
2020-01-01 00:00:03 0 0 0 1 1 1 1 1 1
2020-01-01 00:00:04 1 1 0 0 0 0 1 1 0
通过执行这种类型的命令:
In [1] pd.DataFrame(self.df['data'].to_list())
Out [1] date 1 2 3 4 5 6
2020-01-01 00:00:00 G07 G08 G10 G16
2020-01-01 00:00:01 G07 G08 G16
2020-01-01 00:00:02 G08 G10 G16 G20 G21
2020-01-01 00:00:03 G16 G20 G21 G26 G27 R02
2020-01-01 00:00:04 G07 G08 G26 G27
我只能将列表拆分为其他列。但我找不到将每个值放入特定列的方法。
我一直在考虑对每个日期的每个值进行循环,但速度很慢,而且我的数据集超过 1,000,000 行。
【问题讨论】:
-
我推荐 this answer 用于更大的数据帧。