【问题标题】:how to shift a time series data by a month in python如何在python中将时间序列数据移动一个月
【发布时间】:2020-12-26 14:56:59
【问题描述】:

我尝试将 DataFrame.shift() 函数与 freq = 'M' 一起使用,但是当我偏移 1 个月时,日期会偏移到月底,而不是下个月的同一日期。

有什么办法可以抵消正好 1 个月。即,如果我有一个时间序列数据框并且第一个索引值为23rd August,则在移动一个月后,我希望23rd Sept 处的索引值位于23rd August 处的索引值之前。

请提出一种方法来做到这一点。这会节省很多时间,否则我将不得不使用循环。

我想在此数据框中创建一个新列,以便与索引 20-10-01 10:00:00 和股票代码 AAPL 对应的新列中的值应该是当时列“c”的值20-11-01 10:00:00 和股票代码 AAPL。以此类推其他行。示例数据:

Timestamp('2019-10-01 10:00:00+0000', tz='UTC'): 56.5675,
Timestamp('2019-10-01 16:00:00+0000', tz='UTC'): 56.2725,
Timestamp('2019-10-01 22:00:00+0000', tz='UTC'): 56.2925,
Timestamp('2019-10-02 04:00:00+0000', tz='UTC'): 55.6525,
Timestamp('2019-10-02 10:00:00+0000', tz='UTC'): 54.8025,
Timestamp('2019-10-02 16:00:00+0000', tz='UTC'): 54.625,
Timestamp('2019-10-02 22:00:00+0000', tz='UTC'): 54.625,
Timestamp('2019-10-03 04:00:00+0000', tz='UTC'): 54.825,
Timestamp('2019-10-03 10:00:00+0000', tz='UTC'): 54.7075,
Timestamp('2019-10-03 16:00:00+0000', tz='UTC'): 55.1575,
Timestamp('2019-10-03 22:00:00+0000', tz='UTC'): 55.125,
Timestamp('2019-10-04 04:00:00+0000', tz='UTC'): 55.88,
Timestamp('2019-10-04 10:00:00+0000', tz='UTC'): 56.51,
Timestamp('2019-10-04 16:00:00+0000', tz='UTC'): 56.77,
Timestamp('2019-10-04 22:00:00+0000', tz='UTC'): 56.7375,
Timestamp('2019-10-07 04:00:00+0000', tz='UTC'): 56.5,
Timestamp('2019-10-07 10:00:00+0000', tz='UTC'): 57.3525,
Timestamp('2019-10-07 16:00:00+0000', tz='UTC'): 56.7875,
Timestamp('2019-10-07 22:00:00+0000', tz='UTC'): 56.86,
Timestamp('2019-10-08 04:00:00+0000', tz='UTC'): 56.75,
Timestamp('2019-10-08 10:00:00+0000', tz='UTC'): 56.525,
Timestamp('2019-10-08 16:00:00+0000', tz='UTC'): 55.9775,
Timestamp('2019-10-08 22:00:00+0000', tz='UTC'): 55.925,
Timestamp('2019-10-09 04:00:00+0000', tz='UTC'): 56.75,
Timestamp('2019-10-09 10:00:00+0000', tz='UTC'): 56.6783,
Timestamp('2019-10-09 16:00:00+0000', tz='UTC'): 56.77,
Timestamp('2019-10-09 22:00:00+0000', tz='UTC'): 56.075,
Timestamp('2019-10-10 04:00:00+0000', tz='UTC'): 56.875,
Timestamp('2019-10-10 10:00:00+0000', tz='UTC'): 57.5175,
Timestamp('2019-10-10 16:00:00+0000', tz='UTC'): 57.71,
Timestamp('2019-10-10 22:00:00+0000', tz='UTC'): 57.8125,
Timestamp('2019-10-11 04:00:00+0000', tz='UTC'): 58.235,
Timestamp('2019-10-11 10:00:00+0000', tz='UTC'): 58.62,
Timestamp('2019-10-11 16:00:00+0000', tz='UTC'): 59.1825,
Timestamp('2019-10-11 22:00:00+0000', tz='UTC'): 59.3125,
Timestamp('2019-10-14 04:00:00+0000', tz='UTC'): 58.5925,
Timestamp('2019-10-14 10:00:00+0000', tz='UTC'): 59.25,
Timestamp('2019-10-14 16:00:00+0000', tz='UTC'): 58.975,
Timestamp('2019-10-14 22:00:00+0000', tz='UTC'): 59.1125,
Timestamp('2019-10-15 04:00:00+0000', tz='UTC'): 59.2525,
Timestamp('2019-10-15 10:00:00+0000', tz='UTC'): 58.9238,
Timestamp('2019-10-15 16:00:00+0000', tz='UTC'): 58.9,
Timestamp('2019-10-15 22:00:00+0000', tz='UTC'): 58.75,
Timestamp('2019-10-16 04:00:00+0000', tz='UTC'): 58.565,
Timestamp('2019-10-16 10:00:00+0000', tz='UTC'): 58.59,
Timestamp('2019-10-16 16:00:00+0000', tz='UTC'): 58.6825,
Timestamp('2019-10-16 22:00:00+0000', tz='UTC'): 58.5875,
Timestamp('2019-10-17 04:00:00+0000', tz='UTC'): 58.9375,
Timestamp('2019-10-17 10:00:00+0000', tz='UTC'): 58.48,
Timestamp('2019-10-17 16:00:00+0000', tz='UTC'): 58.8375,
Timestamp('2019-10-17 22:00:00+0000', tz='UTC'): 58.8025,
Timestamp('2019-10-18 04:00:00+0000', tz='UTC'): 58.7275,
Timestamp('2019-10-18 10:00:00+0000', tz='UTC'): 58.7838,
Timestamp('2019-10-18 16:00:00+0000', tz='UTC'): 59.0675,
Timestamp('2019-10-18 22:00:00+0000', tz='UTC'): 59.0525,
Timestamp('2019-10-21 04:00:00+0000', tz='UTC'): 59.3775,
Timestamp('2019-10-21 10:00:00+0000', tz='UTC'): 60.1825,
Timestamp('2019-10-21 16:00:00+0000', tz='UTC'): 60.165,
Timestamp('2019-10-21 22:00:00+0000', tz='UTC'): 60.1725,
Timestamp('2019-10-22 04:00:00+0000', tz='UTC'): 60.1975,
Timestamp('2019-10-22 10:00:00+0000', tz='UTC'): 60.2975,
Timestamp('2019-10-22 16:00:00+0000', tz='UTC'): 59.8025,
Timestamp('2019-10-22 22:00:00+0000', tz='UTC'): 59.755,
Timestamp('2019-10-23 04:00:00+0000', tz='UTC'): 60.3975,
Timestamp('2019-10-23 10:00:00+0000', tz='UTC'): 60.6265,
Timestamp('2019-10-23 16:00:00+0000', tz='UTC'): 60.8875,
Timestamp('2019-10-23 22:00:00+0000', tz='UTC'): 61.0275,
Timestamp('2019-10-24 04:00:00+0000', tz='UTC'): 61.0525,
Timestamp('2019-10-24 10:00:00+0000', tz='UTC'): 60.82,
Timestamp('2019-10-24 16:00:00+0000', tz='UTC'): 60.8125,
Timestamp('2019-10-24 22:00:00+0000', tz='UTC'): 60.8225,
Timestamp('2019-10-25 04:00:00+0000', tz='UTC'): 60.75,
Timestamp('2019-10-25 10:00:00+0000', tz='UTC'): 61.3425,
Timestamp('2019-10-25 16:00:00+0000', tz='UTC'): 61.7,
Timestamp('2019-10-25 22:00:00+0000', tz='UTC'): 61.6875,
Timestamp('2019-10-28 04:00:00+0000', tz='UTC'): 61.8575,
Timestamp('2019-10-28 10:00:00+0000', tz='UTC'): 62.1388,
Timestamp('2019-10-28 16:00:00+0000', tz='UTC'): 62.285,
Timestamp('2019-10-28 22:00:00+0000', tz='UTC'): 62.2875,
Timestamp('2019-10-29 04:00:00+0000', tz='UTC'): 62.15,
Timestamp('2019-10-29 10:00:00+0000', tz='UTC'): 60.7952,
Timestamp('2019-10-29 16:00:00+0000', tz='UTC'): 60.9525,
Timestamp('2019-10-29 22:00:00+0000', tz='UTC'): 60.9575,
Timestamp('2019-10-30 04:00:00+0000', tz='UTC'): 60.9575,
Timestamp('2019-10-30 10:00:00+0000', tz='UTC'): 60.5125,
Timestamp('2019-10-30 16:00:00+0000', tz='UTC'): 62.05,
Timestamp('2019-10-30 22:00:00+0000', tz='UTC'): 62.0475,
Timestamp('2019-10-31 04:00:00+0000', tz='UTC'): 61.76,
Timestamp('2019-10-31 10:00:00+0000', tz='UTC'): 62.0523,
Timestamp('2019-10-31 16:00:00+0000', tz='UTC'): 62.105,
Timestamp('2019-10-31 22:00:00+0000', tz='UTC'): 62.14,
Timestamp('2019-11-01 04:00:00+0000', tz='UTC'): 62.35,
Timestamp('2019-11-01 10:00:00+0000', tz='UTC'): 63.3099,
Timestamp('2019-11-01 16:00:00+0000', tz='UTC'): 63.9725,
Timestamp('2019-11-01 22:00:00+0000', tz='UTC'): 64.025,
Timestamp('2019-11-04 10:00:00+0000', tz='UTC'): 64.2388,
Timestamp('2019-11-04 16:00:00+0000', tz='UTC'): 64.375,
Timestamp('2019-11-04 22:00:00+0000', tz='UTC'): 64.4975,
Timestamp('2019-11-05 04:00:00+0000', tz='UTC'): 64.575}}

这是数据集 并且预期的新列是:62.35 63.3099、63.9725、64.025 等 我想要提前 1 个月的值 但是使用df['new_column'] = df.shift(1, freq = 'M')['c'] 并不能完成这项工作

【问题讨论】:

  • 您能否提供数据框的代码/文本和预期输出?
  • 请不要图片。提供例如的输出改为df.head(20).to_dict()
  • @anon01,现在知道了吗? c 这是列的名称,它是一个多索引数据框
  • 只有三行数据。添加足够的内容(添加到问题中,而不是作为评论),以便我们实际加载您的数据并使用它来回答问题
  • @anon01 现在知道了吗?

标签: python pandas time-series


【解决方案1】:

问题很简单,但是您需要在日期中添加一些特定的内容才能获得n

  1. 找到你需要移动的行数,我称之为n并使用pd.DateOffset(months=1)
  2. 您需要使用- 向上移动n

df['Date'] = pd.to_datetime(pd.to_datetime(df['Date']).dt.date)
n = df[df['Date'].isin(pd.to_datetime(df['Date'] + pd.DateOffset(months=1)))].index[0]
df['new_column'] = df['c'].shift(-n)
df
Out[1]: 
         Date       c  new_column
0  2019-10-01 56.5675       62.35
1  2019-10-01 56.2725     63.3099
2  2019-10-01 56.2925     63.9725
3  2019-10-02 55.6525      64.025
4  2019-10-02 54.8025     64.2388
..        ...     ...         ...
94 2019-11-01  64.025         nan
95 2019-11-04 64.2388         nan
96 2019-11-04  64.375         nan
97 2019-11-04 64.4975         nan
98 2019-11-05  64.575         nan

请注意,为了获得上述输出,我使用了:

df = pd.DataFrame(
    {pd.Timestamp('2019-10-01 10:00:00+0000', tz='UTC'): 56.5675,
    pd.Timestamp('2019-10-01 16:00:00+0000', tz='UTC'): 56.2725,
    pd.Timestamp('2019-10-01 22:00:00+0000', tz='UTC'): 56.2925,
    pd.Timestamp('2019-10-02 04:00:00+0000', tz='UTC'): 55.6525,
    pd.Timestamp('2019-10-02 10:00:00+0000', tz='UTC'): 54.8025,
    pd.Timestamp('2019-10-02 16:00:00+0000', tz='UTC'): 54.625,
    pd.Timestamp('2019-10-02 22:00:00+0000', tz='UTC'): 54.625,
    pd.Timestamp('2019-10-03 04:00:00+0000', tz='UTC'): 54.825,
    pd.Timestamp('2019-10-03 10:00:00+0000', tz='UTC'): 54.7075,
    pd.Timestamp('2019-10-03 16:00:00+0000', tz='UTC'): 55.1575,
    pd.Timestamp('2019-10-03 22:00:00+0000', tz='UTC'): 55.125,
    pd.Timestamp('2019-10-04 04:00:00+0000', tz='UTC'): 55.88,
    pd.Timestamp('2019-10-04 10:00:00+0000', tz='UTC'): 56.51,
    pd.Timestamp('2019-10-04 16:00:00+0000', tz='UTC'): 56.77,
    pd.Timestamp('2019-10-04 22:00:00+0000', tz='UTC'): 56.7375,
    pd.Timestamp('2019-10-07 04:00:00+0000', tz='UTC'): 56.5,
    pd.Timestamp('2019-10-07 10:00:00+0000', tz='UTC'): 57.3525,
    pd.Timestamp('2019-10-07 16:00:00+0000', tz='UTC'): 56.7875,
    pd.Timestamp('2019-10-07 22:00:00+0000', tz='UTC'): 56.86,
    pd.Timestamp('2019-10-08 04:00:00+0000', tz='UTC'): 56.75,
    pd.Timestamp('2019-10-08 10:00:00+0000', tz='UTC'): 56.525,
    pd.Timestamp('2019-10-08 16:00:00+0000', tz='UTC'): 55.9775,
    pd.Timestamp('2019-10-08 22:00:00+0000', tz='UTC'): 55.925,
    pd.Timestamp('2019-10-09 04:00:00+0000', tz='UTC'): 56.75,
    pd.Timestamp('2019-10-09 10:00:00+0000', tz='UTC'): 56.6783,
    pd.Timestamp('2019-10-09 16:00:00+0000', tz='UTC'): 56.77,
    pd.Timestamp('2019-10-09 22:00:00+0000', tz='UTC'): 56.075,
    pd.Timestamp('2019-10-10 04:00:00+0000', tz='UTC'): 56.875,
    pd.Timestamp('2019-10-10 10:00:00+0000', tz='UTC'): 57.5175,
    pd.Timestamp('2019-10-10 16:00:00+0000', tz='UTC'): 57.71,
    pd.Timestamp('2019-10-10 22:00:00+0000', tz='UTC'): 57.8125,
    pd.Timestamp('2019-10-11 04:00:00+0000', tz='UTC'): 58.235,
    pd.Timestamp('2019-10-11 10:00:00+0000', tz='UTC'): 58.62,
    pd.Timestamp('2019-10-11 16:00:00+0000', tz='UTC'): 59.1825,
    pd.Timestamp('2019-10-11 22:00:00+0000', tz='UTC'): 59.3125,
    pd.Timestamp('2019-10-14 04:00:00+0000', tz='UTC'): 58.5925,
    pd.Timestamp('2019-10-14 10:00:00+0000', tz='UTC'): 59.25,
    pd.Timestamp('2019-10-14 16:00:00+0000', tz='UTC'): 58.975,
    pd.Timestamp('2019-10-14 22:00:00+0000', tz='UTC'): 59.1125,
    pd.Timestamp('2019-10-15 04:00:00+0000', tz='UTC'): 59.2525,
    pd.Timestamp('2019-10-15 10:00:00+0000', tz='UTC'): 58.9238,
    pd.Timestamp('2019-10-15 16:00:00+0000', tz='UTC'): 58.9,
    pd.Timestamp('2019-10-15 22:00:00+0000', tz='UTC'): 58.75,
    pd.Timestamp('2019-10-16 04:00:00+0000', tz='UTC'): 58.565,
    pd.Timestamp('2019-10-16 10:00:00+0000', tz='UTC'): 58.59,
    pd.Timestamp('2019-10-16 16:00:00+0000', tz='UTC'): 58.6825,
    pd.Timestamp('2019-10-16 22:00:00+0000', tz='UTC'): 58.5875,
    pd.Timestamp('2019-10-17 04:00:00+0000', tz='UTC'): 58.9375,
    pd.Timestamp('2019-10-17 10:00:00+0000', tz='UTC'): 58.48,
    pd.Timestamp('2019-10-17 16:00:00+0000', tz='UTC'): 58.8375,
    pd.Timestamp('2019-10-17 22:00:00+0000', tz='UTC'): 58.8025,
    pd.Timestamp('2019-10-18 04:00:00+0000', tz='UTC'): 58.7275,
    pd.Timestamp('2019-10-18 10:00:00+0000', tz='UTC'): 58.7838,
    pd.Timestamp('2019-10-18 16:00:00+0000', tz='UTC'): 59.0675,
    pd.Timestamp('2019-10-18 22:00:00+0000', tz='UTC'): 59.0525,
    pd.Timestamp('2019-10-21 04:00:00+0000', tz='UTC'): 59.3775,
    pd.Timestamp('2019-10-21 10:00:00+0000', tz='UTC'): 60.1825,
    pd.Timestamp('2019-10-21 16:00:00+0000', tz='UTC'): 60.165,
    pd.Timestamp('2019-10-21 22:00:00+0000', tz='UTC'): 60.1725,
    pd.Timestamp('2019-10-22 04:00:00+0000', tz='UTC'): 60.1975,
    pd.Timestamp('2019-10-22 10:00:00+0000', tz='UTC'): 60.2975,
    pd.Timestamp('2019-10-22 16:00:00+0000', tz='UTC'): 59.8025,
    pd.Timestamp('2019-10-22 22:00:00+0000', tz='UTC'): 59.755,
    pd.Timestamp('2019-10-23 04:00:00+0000', tz='UTC'): 60.3975,
    pd.Timestamp('2019-10-23 10:00:00+0000', tz='UTC'): 60.6265,
    pd.Timestamp('2019-10-23 16:00:00+0000', tz='UTC'): 60.8875,
    pd.Timestamp('2019-10-23 22:00:00+0000', tz='UTC'): 61.0275,
    pd.Timestamp('2019-10-24 04:00:00+0000', tz='UTC'): 61.0525,
    pd.Timestamp('2019-10-24 10:00:00+0000', tz='UTC'): 60.82,
    pd.Timestamp('2019-10-24 16:00:00+0000', tz='UTC'): 60.8125,
    pd.Timestamp('2019-10-24 22:00:00+0000', tz='UTC'): 60.8225,
    pd.Timestamp('2019-10-25 04:00:00+0000', tz='UTC'): 60.75,
    pd.Timestamp('2019-10-25 10:00:00+0000', tz='UTC'): 61.3425,
    pd.Timestamp('2019-10-25 16:00:00+0000', tz='UTC'): 61.7,
    pd.Timestamp('2019-10-25 22:00:00+0000', tz='UTC'): 61.6875,
    pd.Timestamp('2019-10-28 04:00:00+0000', tz='UTC'): 61.8575,
    pd.Timestamp('2019-10-28 10:00:00+0000', tz='UTC'): 62.1388,
    pd.Timestamp('2019-10-28 16:00:00+0000', tz='UTC'): 62.285,
    pd.Timestamp('2019-10-28 22:00:00+0000', tz='UTC'): 62.2875,
    pd.Timestamp('2019-10-29 04:00:00+0000', tz='UTC'): 62.15,
    pd.Timestamp('2019-10-29 10:00:00+0000', tz='UTC'): 60.7952,
    pd.Timestamp('2019-10-29 16:00:00+0000', tz='UTC'): 60.9525,
    pd.Timestamp('2019-10-29 22:00:00+0000', tz='UTC'): 60.9575,
    pd.Timestamp('2019-10-30 04:00:00+0000', tz='UTC'): 60.9575,
    pd.Timestamp('2019-10-30 10:00:00+0000', tz='UTC'): 60.5125,
    pd.Timestamp('2019-10-30 16:00:00+0000', tz='UTC'): 62.05,
    pd.Timestamp('2019-10-30 22:00:00+0000', tz='UTC'): 62.0475,
    pd.Timestamp('2019-10-31 04:00:00+0000', tz='UTC'): 61.76,
    pd.Timestamp('2019-10-31 10:00:00+0000', tz='UTC'): 62.0523,
    pd.Timestamp('2019-10-31 16:00:00+0000', tz='UTC'): 62.105,
    pd.Timestamp('2019-10-31 22:00:00+0000', tz='UTC'): 62.14,
    pd.Timestamp('2019-11-01 04:00:00+0000', tz='UTC'): 62.35,
    pd.Timestamp('2019-11-01 10:00:00+0000', tz='UTC'): 63.3099,
    pd.Timestamp('2019-11-01 16:00:00+0000', tz='UTC'): 63.9725,
    pd.Timestamp('2019-11-01 22:00:00+0000', tz='UTC'): 64.025,
    pd.Timestamp('2019-11-04 10:00:00+0000', tz='UTC'): 64.2388,
    pd.Timestamp('2019-11-04 16:00:00+0000', tz='UTC'): 64.375,
    pd.Timestamp('2019-11-04 22:00:00+0000', tz='UTC'): 64.4975,
    pd.Timestamp('2019-11-05 04:00:00+0000', tz='UTC'): 64.575}, index=['c']).T
df = df.reset_index().rename({'index': 'Date'}, axis=1)

# and then my answer:

df['Date'] = pd.to_datetime(pd.to_datetime(df['Date']).dt.date)
n = df[df['Date'].isin(pd.to_datetime(df['Date'] + 
pd.DateOffset(months=1)))].index[0]
df['new_column'] = df['c'].shift(-n)
df

【讨论】:

    【解决方案2】:

    假设您每天都有唯一的时间戳,并且没有任何缺失的时间戳值。以下可能会起作用,因为您只需根据天数移动行,并且每天没有唯一的时间戳。

    import pandas as pd
    
    # Dummy data
    # I assumed you have 4 unique values for a day's timestamp and don't have any missing values
    lst1 = list(pd.date_range('2020-08-01 04:00:00', periods=60))
    lst2 = list(pd.date_range('2020-08-01 10:00:00', periods=60))
    lst3 = list(pd.date_range('2020-08-01 16:00:00', periods=60))
    lst4 = list(pd.date_range('2020-08-01 22:00:00', periods=60))
    
    lst1.extend(lst2)
    lst1.extend(lst3)
    lst1.extend(lst4)
    
    data = {
        'date': lst1,
        'value': [v for v in range(0,240)]
    }
    
    # Preprocessing
    df = pd.DataFrame(data)
    df = df.sort_values(by=['date'])
    df.reset_index(drop=True, inplace=True)
    
    def update(row,df):
      row['value'] = df.loc[row.name]['value']
      return row
    
    # factor is = X days of shift * Y unique time stamps per day 
    factor = 31 * 4 
    df.apply(update,axis=1,args=[df.shift(-factor)])
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2015-06-13
      • 2017-12-01
      • 1970-01-01
      • 1970-01-01
      • 2021-11-21
      • 2015-03-24
      相关资源
      最近更新 更多