比较Python数据框中的一行和下一行答案

【问题标题】：Comparing one row with next row in Python dataframe比较Python数据框中的一行和下一行
【发布时间】：2016-02-11 21:16:14
【问题描述】：

我想在 python 数据框中将一行与其下一行进行比较，如果它们相同，则进行一些添加。但是，我写的代码不起作用。

我的播放数据帧头在下面。

          GameCode  PlayNumber  PeriodNumber  Clock  OffenseTeamCode  \
0  299004720130829           1             1    900               47   
1  299004720130829           2             1    NaN              299   
2  299004720130829           3             1    NaN              299   
3  299004720130829           4             1    NaN              299   
4  299004720130829           5             1    NaN              299   

   DefenseTeamCode  OffensePoints  DefensePoints  Down  Distance  Spot  \
0              299              0              0   NaN       NaN    65   
1               47              0              0     1        10    75   
2               47              0              0     2         7    72   
3               47              0              0     3         1    66   
4               47              0              0     1        10    64   

  PlayType  DriveNumber  DrivePlay  
0  KICKOFF          NaN        NaN  
1     RUSH            1          1  
2     PASS            1          1  
3     RUSH            1          1  
4     RUSH            1          1

我想比较第一行的游戏代码，它与第二行匹配，做一些添加它们的操作等等。但是我在下面的代码中遇到了错误。

print play.head()
df = pd.DataFrame()

rushingyards = 0
passingyards = 0

for row in play.itertuples():
    if df.empty:
        df = play
    else:
        if play['GameCode'] == df['GameCode']:
            if play['PlayType'] in ('RUSH','PASS'):
                if play['PlayType']=='RUSH':
                    rushingyards = rushingyards+play['Distance']
                else:
                    passingyards  = passingyards + play['Distance']

请帮忙。

【问题讨论】：

我看到可能存在问题的一件事是您没有指定行。尝试 df.ix 或 df.iloc 在您的比较中使用该行，看看是否能解决您的问题。 Ix 可能特别有用，因为您只需要一个行号和一个列名。

标签： python parsing compare dataframe

【解决方案1】：

也许您正在寻找 groupby/sum 操作：

yards = df.groupby(['GameCode', 'PlayType'])['Distance'].sum().unstack('PlayType')
# PlayType         KICKOFF  PASS  RUSH
# GameCode                            
# 299004720130829      NaN     7    21

对于每个GameCode 和PlayType，这是Distances 的总和。 unstack 返回一个索引为GameCodes 和列为PlayTypes 的DataFrame。

import numpy as np
import pandas as pd
nan = np.nan

df = pd.DataFrame(
    {'Clock': [900.0, nan, nan, nan, nan],
     'DefensePoints': [0, 0, 0, 0, 0],
     'DefenseTeamCode': [299, 47, 47, 47, 47],
     'Distance': [nan, 10.0, 7.0, 1.0, 10.0],
     'Down': [nan, 1.0, 2.0, 3.0, 1.0],
     'DriveNumber': [nan, 1.0, 1.0, 1.0, 1.0],
     'DrivePlay': [nan, 1.0, 1.0, 1.0, 1.0],
     'GameCode': [299004720130829, 299004720130829, 299004720130829,
                  299004720130829, 299004720130829],
     'OffensePoints': [0, 0, 0, 0, 0],
     'OffenseTeamCode': [47, 299, 299, 299, 299],
     'PeriodNumber': [1, 1, 1, 1, 1],
     'PlayNumber': [1, 2, 3, 4, 5],
     'PlayType': ['KICKOFF', 'RUSH', 'PASS', 'RUSH', 'RUSH'],
     'Spot': [65, 75, 72, 66, 64]})

yards = df.groupby(['GameCode', 'PlayType'])['Distance'].sum().unstack('PlayType')
passing_yards, rushing_yards = yards['PASS'], yards['RUSH']

注意passing_yards 和rushing_yards 将是系列，索引为GameCodes。

【讨论】：

这正是我想要的。有没有办法将传球和冲球码与单个 gameCode 结合起来？就像，Gamecode、Passingyard、rushingyards 都在一张桌子上。没有重复的游戏代码。 ?
您能帮忙解决上述情况吗？ @unutbu
你能发布想要的结果吗？我不确定您是否想将传球码数与冲球码数相加：yards['PASS'] + yards['RUSH']，或者如果您想删除KICKOFF 列：yards.drop('KICKOFF', axis=1)，或者其他什么...
或者您可能想对特定 GameCode 的传球码数和冲球码数求和，如 yards.loc[299004720130829, 'PASS'] + yards.loc[299004720130829, 'RUSH']。但请注意，您通常不应该处理单个 GameCodes —— 如果您可以使用 Pandas 方法一次计算所有 GameCodes 所需的内容，则可以从 Pandas 中获得更好的性能（以及看起来更优雅的代码）。
我想要如下结果。 Gamecode rushing_yards passyards 299004720130829 893 401 299004720130824 450 657 299004720130821 430 357 我想要一张这样的表格，这样我就可以用它来分析了。