遍历 df 中的行并根据这些值创建一个新列答案

【问题标题】：Iterating through rows in a df and creating a new column based on those values遍历 df 中的行并根据这些值创建一个新列
【发布时间】：2021-09-20 00:35:47
【问题描述】：

我想创建一个新的相对分数（列），将 F1 车手与他们的队友在给定年份和给定团队中进行比较。

我的数据如下：

stats_df.head()

>       driver  year    team    points
>     0 AIT 2020    Williams    0.0
>     1 ALB 2019    Red Bull    76.0
>     2 ALB 2019    AlphaTauri  16.0
>     3 ALB 2020    Red Bull    105.0
>     4 ALO 2013    Ferrari     242.0

我累了：

teams = stats_df['team'].unique()
years = stats_df['year'].unique()
drivers = stats_df['driver'].unique()

for year in years:
    for team in teams:
        team_points = stats_df['points'].loc[stats_df['team']==team].loc[stats_df['year']==year].sum()
        for driver in drivers:
            driver_points = stats_df['points'].loc[stats_df['team']==team].loc[stats_df['year']==year].loc[stats_df['driver']==driver]
            power_score = driver_points/(team_points/2)
            stats_df['power_score'].loc[stats_df['team']==team].loc[stats_df['year']==year].loc[stats_df['driver']==driver] = power_score

导致新列 ('power_score') 中出现 NaN。

我们将不胜感激。

【问题讨论】：

您应该提供一个更好的示例数据框，使用这个数据框，每年/团队只有一名车手，这导致您的所有功率得分为 2 或 NaN。

标签： python pandas numpy data-science

【解决方案1】：

查看您的代码，您可以使用.groupby(["team", "year"]) 计算team_points，然后将points 简单地除以这些值：

team_points = df.groupby(["team", "year"])["points"].transform("sum")
df["power_score"] = df["points"] / (team_points / 2)
print(df)

打印：

  driver  year        team  points  power_score
0    AIT  2020    Williams     0.0          NaN
1    ALB  2019    Red Bull    76.0          2.0
2    ALB  2019  AlphaTauri    16.0          2.0
3    ALB  2020    Red Bull   105.0          2.0
4    ALO  2013     Ferrari   242.0          2.0

【讨论】：