【发布时间】:2019-03-07 10:51:27
【问题描述】:
我想对我拥有的数据框的每一行应用一个函数。数据帧的一个 sn-p 是这样的:
import pandas as pd
import numpy as np
import math
data = {'EVENT_ID': [112335580,112335580,112335580,112335580,112335580,112335580,112335580,112335580, 112335582,
112335582,112335582,112335582,112335582,112335582,112335582,112335582,112335582,112335582,
112335582,112335582,112335582],
'SELECTION_ID': [6356576,2554439,2503211,6297034,4233251,2522967,5284417,7660920,8112876,7546023,8175276,8145908,
8175274,7300754,8065540,8175275,8106158,8086265,2291406,8065533,8125015],
'BSP': [5.080818565,6.651493872,6.374683435,24.69510797,7.776082305,11.73219964,270.0383021,4,8.294425408,335.3223613,
14.06040142,2.423340019,126.7205863,70.53780982,21.3328554,225.2711962,92.25113066,193.0151362,3.775394142,
95.3786641,17.86333041],
'WIN_LOSE':[1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0]}
df = pd.DataFrame(data, columns=['EVENT_ID', 'SELECTION_ID', 'BSP','WIN_LOSE'])
df = df.sort_values(["EVENT_ID","BSP"])
df.set_index(['EVENT_ID', 'SELECTION_ID'], inplace=True)
df['Win_Percentage'] = 1/df['BSP']
df['Lose_Percentage'] = 1 - df['Win_Percentage']
我想将以下函数应用于Lose_Percentage 列:
def test(df):
x_list = df.values
y_list = []
for x in x_list:
y = math.sin(x/1000)*2000
return y
为此,我使用如下变换函数:
df['Fit'] = df.groupby(level=0)['Lose_Percentage'].transform(test)
问题是它为df['Fit'] 列的每一行返回相同的值。我希望它返回从 df['Lose_Percentage'] 列上的该行获取的值,并将其添加到新的 df['Fit'] 列中。
如果正确完成,df['Fit'] 列将包含索引 112335580 的值:
[1.499999859375004, 1.6063624685814168, 1.6862587304992693, 1.6993154622916136, 1.742800855666326, 1.8295287282081318, 1.9190120053704878, 1.992593313611782]
我尝试过像这样调整函数:
def test(df):
x_list = df.values
y_list = []
for x in x_list:
y = math.sin(x/1000)*2000
y_list.append(y)
for fit in y_list:
return fit
但这会返回与上一次尝试相同的结果。我也尝试更改 return 命令的缩进,但这也不起作用。
【问题讨论】:
标签: python pandas dataframe transform