如何为 pandas df 正确编写 if-then lambda 语句？答案

【问题标题】：How to properly write if-then lambda statement for pandas df?如何为 pandas df 正确编写 if-then lambda 语句？
【发布时间】：2021-11-05 07:13:45
【问题描述】：

我有以下代码：

data = [[11001218, 'Value', 93483.37, 'G', '', 93483.37, '', '56117J100', 'FRA', 'Equity'], 
        [11001218, 'Value', 3572.73, 'G', 3572.73, '', '56117J100', '', 'LUM', 'Equity'], 
        [11001218, 'Value', 89910.64, 'G', 89910.64, '', '56117J100', '', 'WAR', 'Equity'],
        [11005597, 'Value', 72640313.34,'L','',72640313.34, 'REVR21964', '','IN2',  'Repo']
       ]

df = pd.DataFrame(data, columns = ['ID', 'Type', 'Diff', 'Group', 'Amount','Amount2', 'Id2', 'Id3', 'Executor', 'Name'])

def logic_builder(row, row2, row3):
    if row['Name'] == 'Repo' and row['Group'] == 'L':
        return 'Fine resultant'
    elif (row['ID'] == row2['ID']) and (row['ID'] == row3['ID']) and (row['Group'] == row2['Group']) and (row['Group'] == row3['Group']) and (row['Executor'] != row2['Executor']) and (row['Executor'] != row3['Executor']):    
        return 'Difference in Executor'

df['Results'] = df.apply(lambda row: logic_builder(row, row2, row3), axis=1)

如果您查看前 3 行，它们在技术上都是相同的。它们包含相同的 ID、类型、组和名称。唯一的区别是执行者，因此我希望我的 if-then 语句返回“执行者的差异”。我无法弄清楚如何纠正 if-then 以查看我上面提到的字段的所有具有相似属性的行。

谢谢。

【问题讨论】：

标签： python pandas if-statement lambda apply

【解决方案1】：

您可以传递单个行，然后确定其索引并使用df.iloc[index] 查找其他行。

这里是一个例子

def logic_builder(row):
    global df #you need to access the df

    i = row.name #row index

    #get next rows
    try:
        row2 = df.iloc[i+1] 
        row3 = df.iloc[i+2]
    except IndexError:
        return
    
    if row['Name'] == 'Repo' and row['Group'] == 'L':
        return 'Fine resultant'
    elif (row['ID'] == row2['ID']) and (row['ID'] == row3['ID']) and (row['Group'] == row2['Group']) and (row['Group'] == row3['Group']) and (row['Executor'] != row2['Executor']) and (row['Executor'] != row3['Executor']):    
        return 'Difference in Executor'

df['Results'] = df.apply(logic_builder, axis=1)

当然，由于结果取决于接下来的两行，因此您不能在数据帧的最后两行上运行它。

【讨论】：

【解决方案2】：

您可以根据使用groupby 的组稍微修改函数以在数据帧的块/切片上执行，因为您是按组执行操作。您编写的函数的修改版本如下所示：

def logic_builder(group):
    if group['Name'].eq('Repo').all() and group['Group'].eq('L').all():
        return 'Fine resultant'
    elif group['Group'].nunique()==1 and group['Executor'].nunique()>1:    
        return 'Difference in Executor'

row1, row2, row3,..,rown 实际上不会起作用，因为每个组可能有一行或多行，因此更好的策略是使用 all 和 nunique（其中基本上给出了所选列中唯一值的数量）用于您拥有的上述逻辑。

然后在groupby对象上应用函数：

df.groupby('ID').apply(logic_builder)
ID
11001218    Difference in Executor
11005597            Fine resultant
dtype: object

如果需要，您最终可以将上述值加入到实际数据帧中。

【讨论】：