【问题标题】:Lag pandas variable by a year within groups组内滞后大熊猫一年
【发布时间】:2021-04-25 17:16:40
【问题描述】:

我希望在 Pandas 中创建一个滞后的年回报变量。

到目前为止我已经尝试过:

df_ret_lagged = df_ret.set_index(['year', 'cusip'])
df_ret_lagged['yearly_ret_lag'] = df_ret_lagged['year_ret'].shift(12)
df_ret_lagged.reset_index(inplace = True) 

但是,这只是将年度回报向下移动 12 行,而不是按 year 分组。下面的数据框显示了代码所做的事情。

    year    cusip        date       year_ret    yearly_ret_lag
0   1983    000165100   1983-09-01  0.183673    NaN
1   1983    000165100   1983-10-01  0.183673    NaN
2   1983    000165100   1983-11-01  0.183673    NaN
3   1983    000165100   1983-12-01  0.183673    NaN
4   1984    000165100   1984-01-01  -0.482758   NaN
5   1984    000165100   1984-02-01  -0.482758   NaN
6   1984    000165100   1984-03-01  -0.482758   NaN
7   1984    000165100   1984-04-01  -0.482758   NaN
8   1984    000165100   1984-05-01  -0.482758   NaN
9   1984    000165100   1984-06-01  -0.482758   NaN
10  1984    000165100   1984-07-01  -0.482758   NaN
11  1984    000165100   1984-08-01  -0.482758   NaN
12  1984    000165100   1984-09-01  -0.482758   0.183673
13  1984    000165100   1984-10-01  -0.482758   0.183673
14  1984    000165100   1984-11-01  -0.482758   0.183673
15  1984    000165100   1984-12-01  -0.482758   0.183673
16  1985    000165100   1985-01-01  1.700000    -0.482758
17  1985    000165100   1985-02-01  1.700000    -0.482758
18  1985    000165100   1985-03-01  1.700000    -0.482758
19  1985    000165100   1985-04-01  1.700000    -0.482758

理想情况下,我希望 1983 year_ret 填充所有 1984 日期,依此类推。此外,这些必须按 cusip(公司标识符)分组。

谢谢!

【问题讨论】:

  • 你到底想对 cusips 做什么?
  • 嗨...任何答案对您有帮助吗?如果您认为某个答案解决了问题,请单击绿色复选标记将其标记为“已接受”。这有助于将注意力集中在仍然没有答案的旧 SO 问题上。谢谢!

标签: python pandas dataframe data-science finance


【解决方案1】:

另一种没有循环的解决方案,它可以让您缺少月份或 groupby cusid 的可能性是:

df 的构建:

dates = pd.date_range("1983-09-01","1985-12-31",freq="1M")
df = pd.DataFrame(index =dates,columns=["Year","cusip","year_ret"])
df['Year'] = df.index
df['Year'] = df['Year'].dt.strftime(date_format='%Y')
df['cusip'] = '01234'
df['year_ret'] =[0.183673,0.183673,0.183673,0.183673,-0.482758,-0.482758,-0.482758,-0.482758,-0.482758,-0.482758,-0.482758,-0.482758,-0.482758,-0.482758,-0.482758,-0.482758,1.700000,1.700000,1.700000,1.700000,1.700000,1.700000,1.700000,1.700000,1.700000,1.700000,1.700000,1.700000]

还有代码:

#### First, condition if year changes
_condition_1 = df.Year != df.Year.shift(1)

#### If condition is True, put the past ret as new
df['lag'] = np.where(_condition_1,df['year_ret'].shift(1),np.nan)

#### Fill the nan, and it's ok
df = df.fillna(method='ffill')

【讨论】:

    【解决方案2】:

    我认为这可能是您所要求的。请注意,这依赖于提前对数据框进行正确排序和结构化(例如,每个月都有条目)。

    这会在移动之前按 cusip 和日期对所有内容进行排序,然后通过用 nan 覆盖它们来擦除 cusip 之间不匹配的值。然后你可以使用.fillna(method='bfill') 来获取之前的值。

    df_new = df_ret.sort_values(['cusip','date'])
    df_new['yearly_ret_lag'] = df_new['year_ret'].shift(12)
    df_new.loc[ (df_new['cusip'] != df_new['cusip'].shift(12)) ,'yearly_ret_lag'] = np.nan
    df_new['yearly_ret_lag'] = df_new['yearly_ret_lag'].fillna(method='bfill')
    

    【讨论】:

      【解决方案3】:

      我使用了一个 for 循环:

      for year in df['year'].unique()[1:]: #list of all the years except the first
          df.loc[df['year'] == year, 'year_ret_lag'] = df.loc[df['year'] == year-1, 'year_ret'].iloc[0]
      

      df

          year    cusip   date        year_ret    year_ret_lag
      0   1983    165100  01/09/1983  0.183673    NaN
      1   1983    165100  01/10/1983  0.183673    NaN
      2   1983    165100  01/11/1983  0.183673    NaN
      3   1983    165100  01/12/1983  0.183673    NaN
      4   1984    165100  01/01/1984  -0.482758   0.183673
      5   1984    165100  01/02/1984  -0.482758   0.183673
      6   1984    165100  01/03/1984  -0.482758   0.183673
      7   1984    165100  01/04/1984  -0.482758   0.183673
      8   1984    165100  01/05/1984  -0.482758   0.183673
      9   1984    165100  01/06/1984  -0.482758   0.183673
      10  1984    165100  01/07/1984  -0.482758   0.183673
      11  1984    165100  01/08/1984  -0.482758   0.183673
      12  1984    165100  01/09/1984  -0.482758   0.183673
      13  1984    165100  01/10/1984  -0.482758   0.183673
      14  1984    165100  01/11/1984  -0.482758   0.183673
      15  1984    165100  01/12/1984  -0.482758   0.183673
      16  1985    165100  01/01/1985  1.700000    -0.482758
      17  1985    165100  01/02/1985  1.700000    -0.482758
      18  1985    165100  01/03/1985  1.700000    -0.482758
      19  1985    165100  01/04/1985  1.700000    -0.482758
      

      【讨论】:

        猜你喜欢
        • 2019-09-08
        • 2020-07-17
        • 2020-07-26
        • 2020-10-27
        • 1970-01-01
        • 2019-10-04
        • 1970-01-01
        • 2014-10-04
        • 1970-01-01
        相关资源
        最近更新 更多