【问题标题】:Pandas generate missing dates & hours with 0 values熊猫使用 0 值生成缺失的日期和时间
【发布时间】:2018-05-03 18:27:44
【问题描述】:

我有这个数据框:

date                   station  count
2015-01-01 13:00:00      A        4
2015-01-01 14:00:00      B        2
2015-01-02 15:00:00      A        7

为简单起见,假设该站只有 2 个值:A 和 B

我的目标是为每个日期、每个小时和每个站点生成 0 个计数。

例如代码会生成:

date                   station  count
2015-01-01 00:00:00      A        0
2015-01-01 00:00:00      B        0

这是我尝试过的:

# generate 0 values (no transaction) for each hour at each station
df_trans = df_trans.set_index(['date', 'station'])

(date_index, station_index) = df_trans.index.levels

# generate a range of all dates & hours
all_dates = pd.date_range('2014-01-09', '2015-12-08', freq='H')

new_index = pd.MultiIndex.from_product([all_dates, station_index])

df_trans = df_trans.reindex(new_index)

df_trans = df_trans['net_rate'].fillna(0)

但是结果数据帧不是每小时一次。

输出(日期中没有小时):

               net_rate
2014-01-09 2        0.0
           3        0.0
           4        0.0

【问题讨论】:

    标签: python pandas dataframe time-series


    【解决方案1】:

    对我来说它工作得很好,小的改进是在reindex 中使用参数fill_value=0

    new_index = pd.MultiIndex.from_product([all_dates, station_index], names=('date', 'station'))
    
    df_trans = df_trans.reindex(new_index, fill_value=0)
    
    print (df_trans.head(10))
                                 count
    date                station       
    2014-01-09 00:00:00 A            0
                        B            0
    2014-01-09 01:00:00 A            0
                        B            0
    2014-01-09 02:00:00 A            0
                        B            0
    2014-01-09 03:00:00 A            0
                        B            0
    2014-01-09 04:00:00 A            0
                        B            0
    
    print (df_trans[df_trans['count'] != 0])
                                 count
    date                station       
    2015-01-01 13:00:00 A            4
    2015-01-01 14:00:00 B            2
    2015-01-02 15:00:00 A            7
    

    print (df_trans.index.levels)
    
    [[2014-01-09 00:00:00, 2014-01-09 01:00:00, 2014-01-09 02:00:00, 2014-01-09 03:00:00, 
      2014-01-09 04:00:00, 2014-01-09 05:00:00, 2014-01-09 06:00:00, 2014-01-09 07:00:00, 
      2014-01-09 08:00:00, 2014-01-09 09:00:00, 2014-01-09 10:00:00, 2014-01-09 11:00:00, 
      2014-01-09 12:00:00, 2014-01-09 13:00:00, 2014-01-09 14:00:00, 2014-01-09 15:00:00, 
      2014-01-09 16:00:00, 2014-01-09 17:00:00, 2014-01-09 18:00:00, 2014-01-09 19:00:00, 
      2014-01-09 20:00:00, 2014-01-09 21:00:00, 2014-01-09 22:00:00, 2014-01-09 23:00:00, 
      2014-01-10 00:00:00, 2014-01-10 01:00:00, 2014-01-10 02:00:00, 2014-01-10 03:00:00, 
      2014-01-10 04:00:00, 2014-01-10 05:00:00, 2014-01-10 06:00:00, 2014-01-10 07:00:00, 
      2014-01-10 08:00:00, 2014-01-10 09:00:00, 2014-01-10 10:00:00, 2014-01-10 11:00:00, 
      2014-01-10 12:00:00, 2014-01-10 13:00:00, 2014-01-10 14:00:00, 2014-01-10 15:00:00, 
      2014-01-10 16:00:00, 2014-01-10 17:00:00, 2014-01-10 18:00:00, 2014-01-10 19:00:00, 
      2014-01-10 20:00:00, 2014-01-10 21:00:00, 2014-01-10 22:00:00, 2014-01-10 23:00:00, 
      2014-01-11 00:00:00, 2014-01-11 01:00:00, 2014-01-11 02:00:00, 2014-01-11 03:00:00, 
      2014-01-11 04:00:00, 2014-01-11 05:00:00, 2014-01-11 06:00:00, 2014-01-11 07:00:00, 
      2014-01-11 08:00:00, 2014-01-11 09:00:00, 2014-01-11 10:00:00, 2014-01-11 11:00:00, 
      2014-01-11 12:00:00, 2014-01-11 13:00:00, 2014-01-11 14:00:00, 2014-01-11 15:00:00, 
      2014-01-11 16:00:00, 2014-01-11 17:00:00, 2014-01-11 18:00:00, 2014-01-11 19:00:00, 
      2014-01-11 20:00:00, 2014-01-11 21:00:00, 2014-01-11 22:00:00, 2014-01-11 23:00:00, 
      2014-01-12 00:00:00, 2014-01-12 01:00:00, 2014-01-12 02:00:00, 2014-01-12 03:00:00, 
      2014-01-12 04:00:00, 2014-01-12 05:00:00, 2014-01-12 06:00:00, 2014-01-12 07:00:00, 
      2014-01-12 08:00:00, 2014-01-12 09:00:00, 2014-01-12 10:00:00, 2014-01-12 11:00:00, 
      2014-01-12 12:00:00, 2014-01-12 13:00:00, 2014-01-12 14:00:00, 2014-01-12 15:00:00, 
      2014-01-12 16:00:00, 2014-01-12 17:00:00, 2014-01-12 18:00:00, 2014-01-12 19:00:00, 
      2014-01-12 20:00:00, 2014-01-12 21:00:00, 2014-01-12 22:00:00, 2014-01-12 23:00:00, 
      2014-01-13 00:00:00, 2014-01-13 01:00:00, 2014-01-13 02:00:00, 2014-01-13 03:00:00, ...], ['A', 'B']]
    

    【讨论】:

    • 感谢您的帮助。我尝试了您的解决方案,但它返回与我的代码相同的输出。我在我的问题中发布了输出。
    • 所以需要 3 个级别的多索引 = 日期、时间和站点?
    • 所以它会返回您的解决方案。
    • 对不起,你是什么意思? df_trans 中的默认date 列是带小时的日期,我不知道为什么输出中缺少小时
    • 我添加回答df_trans.index.levels - 这是你想要的吗?
    猜你喜欢
    • 2018-11-10
    • 2017-12-12
    • 1970-01-01
    • 2018-04-24
    • 2016-07-05
    • 1970-01-01
    • 1970-01-01
    • 2021-11-21
    相关资源
    最近更新 更多