【发布时间】:2023-04-05 10:57:02
【问题描述】:
我正在尝试解决熊猫数据框问题,
我有一个数据框,其中包含三列:
import numpy as np
np.random.seed(0)
dataframe = pd.DataFrame({'operation': ['data_a', 'data_b', 'avg', 'concat', 'sum', 'data_a', 'concat'],
'data_a': list(np.random.uniform(-1,1,[7,2])), 'data_b': list(np.random.uniform(-1,1,[7,2]))})
“操作”列表示合并列,因此“操作”列中有“data_a”值,表示取该行的data_a值,如果有“avg”操作,则取“data_a”的平均值以及该特定行的“data_b”等等。
我对输出的期望,一个新列包含根据操作列的合并函数的值
我尝试过的:
dataframe['new_column'] = 'dummy_values'
for i in range(len(dataframe)):
if dataframe['operation'].iloc[i] == 'data_a':
dataframe['new_column'].iloc[i] = dataframe['data_a'].iloc[i]
elif dataframe['operation'].iloc[i] == 'data_b':
dataframe['new_column'].iloc[i] = dataframe['data_b'].iloc[i]
elif dataframe['operation'].iloc[i] == 'avg':
dataframe['new_column'].iloc[i] = dataframe[['data_a','data_b']].iloc[i].mean()
elif dataframe['operation'].iloc[i] == 'sum':
dataframe['new_column'].iloc[i] = dataframe[['data_a','data_b']].iloc[i].sum()
elif dataframe['operation'].iloc[i] == 'concat':
dataframe['new_column'].iloc[i] = np.concatenate([dataframe['data_a'].iloc[i], dataframe['data_b'].iloc[i]], axis=0)
上面的解决方案很慢,所以我尝试了如下的 np.select 方法
import numpy as np
con1 = dataframe['operation'] == 'data_a'
con2 = dataframe['operation'] == 'data_b'
val1 = dataframe['data_a']
val2 = dataframe['data_b']
dataframe['new_column'] = np.select([con1,con2], [val1,val2])
但是如果我使用 np.select 选择两列,则会出现错误:
import numpy as np
con1 = dataframe['operation'] == 'data_a'
con2 = dataframe['operation'] == 'data_b'
con3 = dataframe['operation'] == 'avg'
val1 = dataframe['data_a']
val2 = dataframe['data_b']
val3 = dataframe[['data_b', 'data_a']].mean()
dataframe['new_column'] = np.select([con1,con2,con3], [val1,val2,val3])
错误信息
ValueError: shape mismatch: objects cannot be broadcast to a single shape
如何用 np.select 选择不同的条件?
【问题讨论】:
-
数据帧[['data_b', 'data_a']].mean(axis=1) ?
-
@BEN_YO 添加答案,我会接受的。
-
什么是
take_a? -
@MadPhysicist 已更正。
-
什么是
concat?
标签: python python-3.x pandas numpy