如何有效地将数组解码为熊猫数据框中的列答案

【问题标题】：how to efficiently decode arrays to columns in pandas dataframe如何有效地将数组解码为熊猫数据框中的列
【发布时间】：2019-11-28 08:44:59
【问题描述】：

我有一个函数可以为一年中的每个月生成结果。在我的数据框中，我为不同的数据列收集这些结果。之后，我有一个数据框，其中包含多个以数组为值的列。现在我想“旋转”这些列以使每个值都在自己的列中。例如，如果一行在“A”列中包含值 [1,2,3,4,5,6,7,8,9,10,11,12]，我希望有 12 列“A_01”，“ A_02', ..., 'A_12' 分别包含数组中的一个值。

我当前的代码是这样的：

    # create new columns
    columns_to_add = []
    column_count = len(columns_to_process)

    for _, row in df[columns_to_process].iterrows():
        columns_to_add += [[row[name][offset] if type(row[name]) == list else row[name]
                            for offset in range(array_len) for name in range(column_count)]]

    new_df = pd.DataFrame(columns_to_add,
                          columns=[name+'_'+str(offset+1) for offset in range(array_len)
                                   for name in columns_to_process],
                          index=df.index)  # make dataframe addendum

（注意：有些行没有任何值，所以我不得不将条件if type() == list 放入迭代中）

但是这段代码非常慢。我相信必须有一个更优雅的解决方案。你能告诉我这样的解决方案吗？

【问题讨论】：

作为一个小提示......而不是if type(row[name]) == list，最好使用if isinstance(row[name], list)。 Docs here

标签： python pandas dataframe

【解决方案1】：

IIUC，将Series.tolist 与pandas.DataFrame 构造函数一起使用。

我们还将使用DataFrame.rename 来修正您的列名格式。

# Setup
df = pd.DataFrame({'A': [ [1,2,3,4,5,6,7,8,9,10,11,12] ]})

pd.DataFrame(df['A'].tolist()).rename(columns=lambda x: f'A_{x+1:0>2d}')

[出]

   A_01  A_02  A_03  A_04  A_05  A_06  A_07  A_08  A_09  A_10  A_11  A_12
0     1     2     3     4     5     6     7     8     9    10    11    12

【讨论】：

不错！我不得不用空数组替换数据中的 NaN，但它比我的手动实现要快得多。谢谢！