【问题标题】:KeyError: ('1', 'occurred at index 0')KeyError: ('1', '发生在索引 0')
【发布时间】:2019-10-22 15:22:59
【问题描述】:

我真的是使用 python 的新手,我正在使用以下数据框:

    data1 = {'Store_ID':['1','1','1','1','2','2','2','3','3'],
             'YearMonth':[201801,201802,201805,201904,201812,201902,201906,201904,201907],
             'AVG_Rating':[5.0,4.5,4.0,3.5,3.0,4.5,4.0,2.5,4.0]}

    df1 = pd.DataFrame(data1)
--------------------AVG_Rating
Store_ID    AnoMes  
1           201801  5.0
            201802  4.5
            201805  4.0
            201904  3.5
2           201812  3.0
            201902  4.5
            201906  4.0
3           201904  2.5
            201907  4.0
    data2 = {'Client_ID':['1212','1234','1122','1230'],
             'Store_ID':['1','1','2','3'],
             'YearMonth':[201804,201906,201904,201906]}
------------Client_ID---YearMonth
Store_ID        
1           1212        201804
1           1234        201906
2           1122        201904
3           1230        201906

我通过 Store_ID 列将索引设置为两个 DF。

我必须根据 YearMonth 列合并两者,从 DF1 获取最新的 AVG_Rating,这是客户在商店购买的年份月份。我的最终数据框必须是:

-------Client_ID----YearMonth-----AVG_Rating Store_ID
1 1212 201804 4.5(201802评级)

为此,我正在尝试使用更多应用功能以下的功能,但出现错误:

    def get_previous_loja_rating(row):
        loja = df1[row['Loja_ID']]
        lst = loja[loja['AnoMes']] < df2[row['AnoMes']]
        return lst[-1]

    df2['PREVIOUS_RATING_MEAN'] = df1['AnoMes'].apply(get_previous_loja_rating,axis=1)

KeyError: ('Loja_ID', '发生在索引 1')

有人可以帮我解决这个问题吗?

【问题讨论】:

    标签: python dataframe apply axis keyerror


    【解决方案1】:

    我将使用 YearMonth 作为列名,而不是 AnoMes。您的代码功能失败的原因有多种。 据我了解,您希望在 avg rating 列中添加相应商店最近年月的值。

    df1
    Store_ID    YearMonth   AVG_Rating
    0   1   201801  5.0
    1   1   201802  4.5
    2   1   201805  4.0
    3   1   201904  3.5
    4   2   201812  3.0
    df2
    Client_ID   Store_ID    YearMonth
    0   1212    1   201804
    1   1234    1   201906
    2   1122    2   201904
    3   1230    3   201906
    
    
    def get_previous_loja_rating(row):
        loja = df1[df1['Store_ID']==row['Store_ID']]
        lst = [i for i in loja['YearMonth'] if i <= row['YearMonth']] #list of all yearmonth values less than or equal to client's yearmonth
        return df1[(df1['YearMonth']==max(lst))&(df1['Store_ID']==row['Store_ID'])]['AVG_Rating'].iloc[0] # avg rating of the most recent yearmonth
    
    df2['AVG_Rating'] = df2.apply(get_previous_loja_rating,axis=1)
    
    df2
    Client_ID   Store_ID    YearMonth   AVG_Rating
    0   1212    1   201804  4.5
    1   1234    1   201906  3.5
    2   1122    2   201904  4.5
    3   1230    3   201906  2.5
    

    这将为您的客户数据框提供最接近年月的平均评级

    【讨论】:

    • @FernandoRodriguesNepomuceno 如果是想要的结果,请将其标记为答案
    【解决方案2】:

    您似乎正在尝试在代码中使用西班牙语键名(Loja_IDAnoMes 等),而您的数据使用英语。您需要将这些更改为 Client_IDYearMonth

    【讨论】:

      猜你喜欢
      • 2020-09-10
      • 1970-01-01
      • 1970-01-01
      • 2019-01-06
      • 2018-02-11
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-09-15
      相关资源
      最近更新 更多