如何在我的数据框中添加一列，其中包含来自另一个数据框的年份之间的平均值？答案

【问题标题】：How do I add a column to my dataframe containing means between years from another dataframe?如何在我的数据框中添加一列，其中包含来自另一个数据框的年份之间的平均值？
【发布时间】：2021-03-27 01:37:22
【问题描述】：

我在 Python 中有两个数据框，一个包含有关汽车的信息，另一个包含有关燃料价格（汽油和柴油）的信息。数据帧示例如下。

汽车

   regNo  regYear inspectionYear fuelType
0  AB1234 2008    2012           Gasoline
1  CD2345 2009    2011           Diesel
2  LD9876 2010    2013           Diesel

燃料价格

year fuelType price
2008 Gasoline 12.13
2009 Gasoline 19.52
2010 Gasoline 13.32
2011 Gasoline 13.54
2012 Gasoline 16.23
2013 Gasoline 11.34
2008 Diesel   9.43
2009 Diesel   9.37
2010 Diesel   9.89
2011 Diesel   10.04
2012 Diesel   8.42
2013 Diesel   9.21

我尝试在 cars 中添加一列，即 regYear 和 inspectionYear 之间相关fuelType 的平均价格。所以我希望得到这样的结果：

cars_newCol

   regNo  regYear inspectionYear fuelType fuelPrice
0  AB1234 2008    2012           Gasoline 14.95
1  CD2345 2009    2011           Diesel   9.77
2  LD9876 2010    2013           Diesel   9.39

也就是说，第一行是 Gasoline 的 fuelPrice 在 2008 年到 2012 年之间的平均燃料价格。

我尝试了各种解决方案，但我觉得最接近某件事的一个可能是：

cars['fuelPrice'] = fuel_prices.loc[(fuel_prices['year']>=cars['regYear']) & 
                                    (fuel_prices['year']<=cars['inspectionYear']) &
                                    (fuel_prices['fuelType']==cars['fuelType']),
                                    'price'].mean()

但是，输出不如预期。数据框非常大（约 7 mio.rows），因此我不喜欢在 for 循环中执行它，除非有人认为这可能是有效的。

提前感谢您 - 非常感谢。

【问题讨论】：

标签： python pandas dataframe

【解决方案1】：

你想要merge，然后过滤行和分组：

(cars.merge(fuelPrice, on='fuelType')
     .query('regYear<= year <= inspectionYear')
     .groupby(cars.columns.to_list(), as_index=False)['price'].mean()
)

输出：

    regNo  regYear  inspectionYear  fuelType      price
0  AB1234     2008            2012  Gasoline  14.948000
1  CD2345     2009            2011    Diesel   9.766667
2  LD9876     2010            2013    Diesel   9.390000

【讨论】：