KeyError: ('1', '发生在索引 0')答案

【问题标题】：KeyError: ('1', 'occurred at index 0')KeyError: ('1', '发生在索引 0')
【发布时间】：2019-10-22 15:22:59
【问题描述】：

我真的是使用 python 的新手，我正在使用以下数据框：

    data1 = {'Store_ID':['1','1','1','1','2','2','2','3','3'],
             'YearMonth':[201801,201802,201805,201904,201812,201902,201906,201904,201907],
             'AVG_Rating':[5.0,4.5,4.0,3.5,3.0,4.5,4.0,2.5,4.0]}

    df1 = pd.DataFrame(data1)

--------------------AVG_Rating
Store_ID    AnoMes  
1           201801  5.0
            201802  4.5
            201805  4.0
            201904  3.5
2           201812  3.0
            201902  4.5
            201906  4.0
3           201904  2.5
            201907  4.0

    data2 = {'Client_ID':['1212','1234','1122','1230'],
             'Store_ID':['1','1','2','3'],
             'YearMonth':[201804,201906,201904,201906]}

------------Client_ID---YearMonth
Store_ID        
1           1212        201804
1           1234        201906
2           1122        201904
3           1230        201906

我通过 Store_ID 列将索引设置为两个 DF。

我必须根据 YearMonth 列合并两者，从 DF1 获取最新的 AVG_Rating，这是客户在商店购买的年份月份。我的最终数据框必须是：

-------Client_ID----YearMonth-----AVG_Rating Store_ID
1 1212 201804 4.5（201802评级）

为此，我正在尝试使用更多应用功能以下的功能，但出现错误：

    def get_previous_loja_rating(row):
        loja = df1[row['Loja_ID']]
        lst = loja[loja['AnoMes']] < df2[row['AnoMes']]
        return lst[-1]

    df2['PREVIOUS_RATING_MEAN'] = df1['AnoMes'].apply(get_previous_loja_rating,axis=1)

KeyError: ('Loja_ID', '发生在索引 1')

有人可以帮我解决这个问题吗？

【问题讨论】：

标签： python dataframe apply axis keyerror

【解决方案1】：

我将使用 YearMonth 作为列名，而不是 AnoMes。您的代码功能失败的原因有多种。据我了解，您希望在 avg rating 列中添加相应商店最近年月的值。

df1
Store_ID    YearMonth   AVG_Rating
0   1   201801  5.0
1   1   201802  4.5
2   1   201805  4.0
3   1   201904  3.5
4   2   201812  3.0
df2
Client_ID   Store_ID    YearMonth
0   1212    1   201804
1   1234    1   201906
2   1122    2   201904
3   1230    3   201906


def get_previous_loja_rating(row):
    loja = df1[df1['Store_ID']==row['Store_ID']]
    lst = [i for i in loja['YearMonth'] if i <= row['YearMonth']] #list of all yearmonth values less than or equal to client's yearmonth
    return df1[(df1['YearMonth']==max(lst))&(df1['Store_ID']==row['Store_ID'])]['AVG_Rating'].iloc[0] # avg rating of the most recent yearmonth

df2['AVG_Rating'] = df2.apply(get_previous_loja_rating,axis=1)

df2
Client_ID   Store_ID    YearMonth   AVG_Rating
0   1212    1   201804  4.5
1   1234    1   201906  3.5
2   1122    2   201904  4.5
3   1230    3   201906  2.5

这将为您的客户数据框提供最接近年月的平均评级

【讨论】：

@FernandoRodriguesNepomuceno 如果是想要的结果，请将其标记为答案

【解决方案2】：

您似乎正在尝试在代码中使用西班牙语键名（Loja_ID、AnoMes 等），而您的数据使用英语。您需要将这些更改为 Client_ID 和 YearMonth。

【讨论】：