将数组列表分配给 Pandas DataFrame 中的几列（性能优化）答案

【问题标题】：Assign list of arrays to several columns in Pandas DataFrame (Performance-optimized)将数组列表分配给 Pandas DataFrame 中的几列（性能优化）
【发布时间】：2022-01-18 12:26:02
【问题描述】：

给定以下 DF： df = pd.DataFrame(data=np.random.randint(1,10,size=(10,4)),columns=list("abcd"),dtype=np.int64)

假设我想用两个 numpy 数组的列表更新前两列（具有特定的 dtype：例如 np.int8 和 np.float32）--> update_vals = [np.arange(1,11,dtype=np.int8),np.ones(10,dtype=np.float32)]

我可以执行以下操作：df[["a","b"]] = pd.DataFrame(dict(zip(list("ab"),update_vals)))

Column Dtypes 的预期结果：

一个：np.int8
b=np.float32
[c,d]=np.int64

是否有更快的方法来做到这一点？

【问题讨论】：

嗨，你在 StackOverflow 上，如果你可以让它工作但你想要更好的性能，你应该使用codereview.stackexchange.com
如果所有列都具有相同的 dtype，例如 float，则数据帧可以将所有列存储在 (n,4) 数组中，但是当 dtype 不同时，底层存储将为每个 dtype 提供单独的数组，如果不是每一列（系列）。并且对 2 个数组的任何无循环操作都需要将它们转换为一个具有统一 dtype 的数组。由于更改 dtype 似乎是您的首要任务，因此您必须以某种方式分别处理每个数组/列。

标签： python pandas dataframe numpy

【解决方案1】：

更新

为什么不简单：

df['a'] = update_vals[0]
df['b'] = update_vals[1]
print(df.dtypes)

# Output:
a       int8
b    float32
c      int64
d      int64
dtype: object

或者：

for col, arr in zip(df.columns, update_vals):
    df[col] = arr

用途：

df[['a', 'b']] = np.array(update_vals).T
print(df)

# Output:
    a  b  c  d
0   1  1  1  2
1   2  1  5  1
2   3  1  4  8
3   4  1  6  3
4   5  1  3  4
5   6  1  8  2
6   7  1  3  1
7   8  1  8  7
8   9  1  4  1
9  10  1  3  6

【讨论】：

嘿，不，但我认为那是因为我的提问不好。也许你可以再看一遍
我更新了我的答案，你能检查一下吗？
它确实有效（与我建议的方法相同）。我正在寻找一种比在这些列上循环更快的方法，因为如果你想更新大量的列，它不是性能优化的
没有比原子操作更快的了...a = 1 是将a 设置为1 的最快方法。