熊猫：从索引对应于另一列中的值的列中选择答案

【问题标题】：Pandas: select from column with index corresponding to values in another column熊猫：从索引对应于另一列中的值的列中选择
【发布时间】：2020-07-13 12:08:35
【问题描述】：

为这个糟糕的标题道歉......

假设我有两个关于现场采样位置的熊猫数据框。 DF1 包含样本 ID、坐标、记录年份等。DF2 包含一个气象变量，以列形式提供每年的值：

import pandas as pd
df1 = pd.DataFrame(data = {'ID': [10, 20, 30], 'YEAR': [1980, 1981, 1991]}, index=[1,2,3])
df2 = pd.DataFrame(data= np.random.randint(0,100,size=(3, 10)), columns=['year_{0}'.format(x) for x in range(1980, 1991)], index=[10, 20, 30])

print(df1)
>   ID YEAR
  1 10 1980
  2 20 1981
  3 30 1991

print(df2)
>    year_1980 year_1981 ... year_1990
  10 48 61 ... 53
  20 68 69 ... 21
  30 76 37 ... 70

请注意DF1 中的 Plot ID 如何对应于DF2.index，以及DF1 采样年份如何超出DF2 的覆盖范围。我想将 DF2 中与 DF1 中的 year 列对应的值作为新列添加到 DF1。到目前为止我所拥有的是：

def grab(df, plot_id, yr):
    try:
        out = df.loc[plot_id, 'year_{}'.format(yr)]
    except KeyError:
        out = -99
    return out

df1['meteo_val'] = df1.apply(lambda row: grab(df2, row.index, row.year), axis=1)
print(df1)
>   ID YEAR meteo_val
  1 10 1980 48
  2 20 1981 69 
  3 30 1991 -99

这可行，但似乎需要很长时间才能计算。我想知道一个更聪明、更快的方法来解决这个问题。有什么建议吗？

【问题讨论】：

标签： python pandas dataframe

【解决方案1】：

设置

np.random.seed(0)
df1 = pd.DataFrame(data = {'ID': [10, 20, 30], 'YEAR': [1980, 1981, 1991]}, index=[1,2,3])
df2 = pd.DataFrame(data= np.random.randint(0,100,size=(3, 11)),
                   columns=['year_{0}'.format(x) for x in range(1980, 1991)],
                   index=[10, 20, 30])

DataFrame.lookup 的解决方案：

mapper = df1.assign(YEAR = ('year_' + df1['YEAR'].astype(str)))
c2 = mapper['ID'].isin(df2.index)
c1 = mapper['YEAR'].isin(df2.columns)
mapper = mapper.loc[c1 & c2]
df1.loc[c2&c1, 'meteo_val'] = df2.lookup(mapper['ID'], mapper['YEAR'])
df1 ['meteo_val'] = df1['meteo_val'].fillna(-99)



   ID  YEAR  meteo_val
1  10  1980       44.0
2  20  1981       88.0
3  30  1991      -99.0

DataFrame.join 和 DataFrame.stack 的替代方案

df1 = df1.join(df2.set_axis(df2.columns.str.split('_').str[1].astype(int),
                      axis=1).stack().rename('meteo_val'),
               on = ['ID', 'YEAR'], how='left').fillna(-99)

【讨论】：