【问题标题】:Adding empty dataframe rows based on missing datetime values根据缺少的日期时间值添加空数据框行
【发布时间】:2018-12-06 21:44:13
【问题描述】:

我正在尝试将行添加到我的 pandas 数据框中:

import pandas as pd
import datetime as dt

d={'datetime':[dt.datetime(2018,3,1,0,0),dt.datetime(2018,3,1,0,10),dt.datetime(2018,3,1,0,40)],
  'value':[4.,5.,1.]}

df=pd.DataFrame(d)

哪些输出:

             datetime  value
0 2018-03-01 00:00:00    4.0
1 2018-03-01 00:10:00    5.0
2 2018-03-01 00:40:00    1.0

我想要做的是添加从 00:00:00 到 00:40:00 的行,每 5 分钟显示一次。我想要的输出如下所示:

             datetime  value
0 2018-03-01 00:00:00    4.0
1 2018-03-01 00:05:00    NaN
2 2018-03-01 00:10:00    5.0
3 2018-03-01 00:15:00    NaN
4 2018-03-01 00:20:00    NaN
5 2018-03-01 00:25:00    NaN
6 2018-03-01 00:30:00    NaN
7 2018-03-01 00:35:00    NaN
8 2018-03-01 00:40:00    1.0

我怎么去那里?

【问题讨论】:

    标签: python pandas datetime dataframe indexing


    【解决方案1】:

    你可以使用pd.DataFrame.resample:

    df = df.resample('5Min', on='datetime').first()\
           .drop('datetime', 1).reset_index()
    
    print(df)
    
                 datetime  value
    0 2018-03-01 00:00:00    4.0
    1 2018-03-01 00:05:00    NaN
    2 2018-03-01 00:10:00    5.0
    3 2018-03-01 00:15:00    NaN
    4 2018-03-01 00:20:00    NaN
    5 2018-03-01 00:25:00    NaN
    6 2018-03-01 00:30:00    NaN
    7 2018-03-01 00:35:00    NaN
    8 2018-03-01 00:40:00    1.0
    

    【讨论】:

    • 谢谢。你。所以。很多。这可能为我节省了几个小时。
    【解决方案2】:

    首先,您可以创建一个包含最终日期时间索引的数据框,然后影响第二个:

    df1 = pd.DataFrame({'value': np.nan} ,index=pd.date_range('2018-03-01 00:00:00', 
                         periods=9, freq='5min'))
    
    print(df)
    #Output :
                       value
    2018-03-01 00:00:00 NaN
    2018-03-01 00:05:00 NaN
    2018-03-01 00:10:00 NaN
    2018-03-01 00:15:00 NaN
    2018-03-01 00:20:00 NaN
    2018-03-01 00:25:00 NaN
    2018-03-01 00:30:00 NaN
    2018-03-01 00:35:00 NaN
    2018-03-01 00:40:00 NaN
    

    现在,假设您的数据框是第二个,您可以将其添加到上面的代码中:

    d={'datetime': 
    [dt.datetime(2018,3,1,0,0),dt.datetime(2018,3,1,0,10),dt.datetime(2018,3,1,0,40)],
    'value':[4.,5.,1.]}
    
    df2=pd.DataFrame(d)
    df2.datetime = pd.to_datetime(df2.datetime)
    df2.set_index('datetime',inplace=True)
    print(df2)
    
    #Output
                       value
    datetime    
    2018-03-01 00:00:00 4.0
    2018-03-01 00:10:00 5.0
    2018-03-01 00:40:00 1.0
    

    最后:

    df1.value = df2.value
    print(df1)
    
    #output
                       value
    2018-03-01 00:00:00 4.0
    2018-03-01 00:05:00 NaN
    2018-03-01 00:10:00 5.0
    2018-03-01 00:15:00 NaN
    2018-03-01 00:20:00 NaN
    2018-03-01 00:25:00 NaN
    2018-03-01 00:30:00 NaN
    2018-03-01 00:35:00 NaN
    2018-03-01 00:40:00 1.0
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2021-10-20
      • 1970-01-01
      • 1970-01-01
      • 2023-01-10
      • 1970-01-01
      • 2017-08-09
      • 2021-08-11
      • 1970-01-01
      相关资源
      最近更新 更多