【问题标题】:How to complement missing data in a time series using python?如何使用python补充时间序列中的缺失数据?
【发布时间】:2021-10-05 20:00:39
【问题描述】:

我有一个数据框,它在一年中的某些日子有价格,现在我想制作一个更大的数据框,显示从年初到某个特定日期的所有天数。然后使用我在原始数据框中已有的天数的价格,并在没有价格的天数之间填写该日期的最后一个价格。

举个例子:

df = pd.DataFrame({
    'timestamps': pd.to_datetime(
        ['2021-01-04', '2021-01-07', '2021-01-14', '2021-01-21', '2021-01-28', '2021-01-29', 
'2021-02-04', '2021-02-12', '2021-02-18', '2021-02-25']),
    'LastPrice':['113.4377','115.0741','115.5709','116.5197','116.681','116.4198','117.5749','117.2175',
 '117.0541','117.5977']})

我希望我的新日期系列是这样的

index=pd.date_range('2021-01-01', '2021-02-28')

dfObj = pd.DataFrame(columns=['new_Date','new_LastPrice'])
dfObj['new_Date'] = index

所以,理想情况下,我应该有类似以下数据框的内容。(只是顶部)

    new_Date    new_LastPrice
0   2021-01-01  0
1   2021-01-02  0
2   2021-01-03  0
3   2021-01-04  113.4377
4   2021-01-05  113.4377
5   2021-01-06  113.4377
6   2021-01-07  115.0741
7   2021-01-08  115.0741
8   2021-01-09  115.0741
9   2021-01-10  115.0741
10  2021-01-11  115.0741
11  2021-01-12  115.0741
12  2021-01-13  115.0741

这里有人可以帮我吗?

【问题讨论】:

    标签: python pandas dataframe date


    【解决方案1】:

    DataFrame.reindexmethod='ffill' 一起使用:

    index=pd.date_range('2021-01-01', '2021-02-28')
    
    dfObj = (df.set_index('timestamps')
               .reindex(index, method='ffill')
               .fillna(0)
               .add_prefix('new_')
               .rename_axis('new_Date')
               .reset_index())
    print (dfObj.head(13))
         new_Date new_LastPrice
    0  2021-01-01             0
    1  2021-01-02             0
    2  2021-01-03             0
    3  2021-01-04      113.4377
    4  2021-01-05      113.4377
    5  2021-01-06      113.4377
    6  2021-01-07      115.0741
    7  2021-01-08      115.0741
    8  2021-01-09      115.0741
    9  2021-01-10      115.0741
    10 2021-01-11      115.0741
    11 2021-01-12      115.0741
    12 2021-01-13      115.0741
    

    【讨论】:

      【解决方案2】:

      这将适用于您的情况:(合并数据框并使用 fillna 作为 ffill 填充缺失值,然后将 filna 作为 0 用于初始记录)

      df = pd.DataFrame({
          'timestamps': pd.to_datetime(
              ['2021-01-04', '2021-01-07', '2021-01-14', '2021-01-21', '2021-01-28', '2021-01-29', 
      '2021-02-04', '2021-02-12', '2021-02-18', '2021-02-25']),
          'LastPrice':['113.4377','115.0741','115.5709','116.5197','116.681','116.4198','117.5749','117.2175',
       '117.0541','117.5977']})
      
      index=pd.date_range('2021-01-01', '2021-02-28')
      
      
      dfObj = pd.DataFrame(columns=['new_Date','new_LastPrice'])
      dfObj['new_Date'] = index
      dfObj = dfObj.merge(df,how='left', left_on='new_Date', right_on='timestamps')
      dfObj = dfObj[['new_Date', 'LastPrice']]
      dfObj = dfObj.fillna(method='ffill')
      dfObj = dfObj.fillna(0)
      

      输出:

      new_Date    LastPrice
      0   2021-01-01  0
      1   2021-01-02  0
      2   2021-01-03  0
      3   2021-01-04  113.4377
      4   2021-01-05  113.4377
      5   2021-01-06  113.4377
      6   2021-01-07  115.0741
      7   2021-01-08  115.0741
      8   2021-01-09  115.0741
      9   2021-01-10  115.0741
      10  2021-01-11  115.0741
      ...
      

      【讨论】:

        【解决方案3】:

        您可以使用pyjanitor 中的complete 函数来抽象暴露缺失值/行的过程:

        #pip install pyjanitor
        import janitor
        import pandas as pd
        index=pd.date_range('2021-01-01', '2021-02-28')
        # assign the new values as a dictionary,
        # with the column name as the key
        new_dates = {"timestamps": index} # accepts a callable too
        (df.complete([new_dates])
           .ffill()
           .fillna(0)
           .set_axis(['new_Date', 'new_LastPrice'],
                     axis = 'columns')
           .head(10) # shows the first 10 rows, you can get rid of this line
         )
        
            new_Date new_LastPrice
        0 2021-01-01             0
        1 2021-01-02             0
        2 2021-01-03             0
        3 2021-01-04      113.4377
        4 2021-01-05      113.4377
        5 2021-01-06      113.4377
        6 2021-01-07      115.0741
        7 2021-01-08      115.0741
        8 2021-01-09      115.0741
        9 2021-01-10      115.0741
        

        【讨论】:

          猜你喜欢
          • 2020-03-19
          • 1970-01-01
          • 1970-01-01
          • 2013-01-13
          • 2019-04-22
          • 1970-01-01
          • 2019-05-16
          • 2015-11-21
          • 2019-06-09
          相关资源
          最近更新 更多