【问题标题】:Format Aggregate timedelta in pandas pivot table格式化熊猫数据透视表中的聚合时间增量
【发布时间】:2017-05-11 03:23:23
【问题描述】:

好的,我查看了pandas pivot table documentation 和以下类似问题:

但是我的问题是输出的timedeltas 在聚合时显示格式1 day 12:30:00,而我希望它显示36:30:00。有没有办法做到这一点?

代码

pd.pivot_table(
  df,
  index=["date","task_type"],
  columns=["calendar"],
  values="duration",
  aggfunc=np.sum,
  fill_value=timedelta(hours=0),
  margins=True,
  margins_name="TOTAL"
)

输出

calendar                  Proj1    Proj2           Proj3           TOTAL
date       task_type
2017-05-04             01:00:00 00:00:00 0 days 00:00:00 0 days 01:00:00
           development 00:00:00 00:00:00 0 days 02:00:00 0 days 02:00:00
           training    00:00:00 00:00:00 0 days 04:00:00 0 days 04:00:00
2017-05-05 admin       00:00:00 02:00:00 0 days 00:00:00 0 days 02:00:00
           development 00:30:00 00:00:00 0 days 03:30:00 0 days 04:00:00
           meeting     00:00:00 00:00:00 0 days 01:00:00 0 days 01:00:00
2017-05-08             01:30:00 00:00:00 0 days 00:00:00 0 days 01:30:00
           admin       00:00:00 02:00:00 0 days 00:00:00 0 days 02:00:00
           development 00:00:00 00:00:00 0 days 05:00:00 0 days 05:00:00
2017-05-09 admin       02:00:00 01:00:00 0 days 00:00:00 0 days 03:00:00
           development 00:00:00 00:00:00 0 days 01:00:00 0 days 01:00:00
           research    01:00:00 00:00:00 0 days 00:00:00 0 days 01:00:00
           training    00:00:00 00:00:00 0 days 03:30:00 0 days 03:30:00
2017-05-10 admin       00:00:00 01:30:00 0 days 00:00:00 0 days 01:30:00
           development 00:00:00 00:00:00 0 days 02:30:00 0 days 02:30:00
           meeting     02:00:00 00:00:00 0 days 00:00:00 0 days 02:00:00
           training    00:00:00 00:00:00 0 days 02:00:00 0 days 02:00:00
2017-05-11 admin       00:00:00 01:00:00 0 days 00:00:00 0 days 01:00:00
           development 00:00:00 02:30:00 0 days 00:00:00 0 days 02:30:00
TOTAL                  08:00:00 10:00:00 1 days 00:30:00 1 days 18:30:00

预期

calendar                  Proj1    Proj2           Proj3           TOTAL
date       task_type
2017-05-04             01:00:00 00:00:00        00:00:00        01:00:00
           development 00:00:00 00:00:00        02:00:00        02:00:00
           training    00:00:00 00:00:00        04:00:00        04:00:00
2017-05-05 admin       00:00:00 02:00:00        00:00:00        02:00:00
           development 00:30:00 00:00:00        03:30:00        04:00:00
           meeting     00:00:00 00:00:00        01:00:00        01:00:00
2017-05-08             01:30:00 00:00:00        00:00:00        01:30:00
           admin       00:00:00 02:00:00        00:00:00        02:00:00
           development 00:00:00 00:00:00        05:00:00        05:00:00
2017-05-09 admin       02:00:00 01:00:00        00:00:00        03:00:00
           development 00:00:00 00:00:00        01:00:00        01:00:00
           research    01:00:00 00:00:00        00:00:00        01:00:00
           training    00:00:00 00:00:00        03:30:00        03:30:00
2017-05-10 admin       00:00:00 01:30:00        00:00:00        01:30:00
           development 00:00:00 00:00:00        02:30:00        02:30:00
           meeting     02:00:00 00:00:00        00:00:00        02:00:00
           training    00:00:00 00:00:00        02:00:00        02:00:00
2017-05-11 admin       00:00:00 01:00:00        00:00:00        01:00:00
           development 00:00:00 02:30:00        00:00:00        02:30:00
TOTAL                  08:00:00 10:00:00        24:30:00        42:30:00

【问题讨论】:

    标签: python pandas


    【解决方案1】:

    没那么容易,但我尝试使用apply 实现this 解决方案:

    def f(x):
        ts = x.dt.total_seconds()
        hours, remainder = divmod(ts, 3600)
        minutes, seconds = divmod(remainder, 60)
        return hours.abs().astype(int).astype(str).str.zfill(2) + ':' + \
               minutes.abs().astype(int).astype(str).str.zfill(2)
    
    
    df1 = df.apply(f)
    print (df1)
    calendar                       Proj1  Proj2  Proj3   TOTAL
    date                task_type                             
    2017-04-03 15:30:00 c          00:00  00:00  00:00   00:00
                        e          10:30  10:30  00:00   21:00
    2017-04-03 19:00:00 d          24:30  00:00  10:30   35:00
    2017-04-04 05:30:00 a          00:00  03:30  00:00   03:30
    2017-04-04 19:30:00 a          00:00  21:00  00:00   21:00
    2017-04-04 23:00:00 a          00:00  00:00  03:30   03:30
                        c          00:00  00:00  17:30   17:30
                        d          00:00  17:30  00:00   17:30
    TOTAL                          35:00  52:30  31:30  119:00
    

    ... 和applymap:

    def f(x):
        ts = x.total_seconds()
        hours, remainder = divmod(ts, 3600)
        minutes, seconds = divmod(remainder, 60)
        return ('{:02d}:{:02d}').format(int(hours), int(minutes)) 
    
    df2 = df.applymap(f)
    print (df2)
    calendar                       Proj1  Proj2  Proj3   TOTAL
    date                task_type                             
    2017-04-03 15:30:00 c          00:00  00:00  00:00   00:00
                        e          10:30  10:30  00:00   21:00
    2017-04-03 19:00:00 d          24:30  00:00  10:30   35:00
    2017-04-04 05:30:00 a          00:00  03:30  00:00   03:30
    2017-04-04 19:30:00 a          00:00  21:00  00:00   21:00
    2017-04-04 23:00:00 a          00:00  00:00  03:30   03:30
                        c          00:00  00:00  17:30   17:30
                        d          00:00  17:30  00:00   17:30
    TOTAL                          35:00  52:30  31:30  119:00
    

    示例设置:

    N = 10
    np.random.seed(10)
    rng = pd.date_range('2017-04-03 15:30:00', periods=N, freq='3.5H')
    df = pd.DataFrame({'date': np.random.choice(rng, size=N), 
                       'task_type': np.random.choice(list('abcde'), size=N), 
                       'calendar': np.random.choice(['Proj1','Proj2','Proj3'], size=N),
                       'duration': np.abs(np.random.choice(rng, size=N) - 
                                          np.random.choice(rng, size=N)) })  
    #print (df)
    
    
    
    df = pd.pivot_table(
      df,
      index=["date","task_type"],
      columns=["calendar"],
      values="duration",
      aggfunc=np.sum,
      fill_value=pd.Timedelta(hours=0),
      margins=True,
      margins_name="TOTAL"
    )
    

    print (df)
    calendar                                Proj1           Proj2           Proj3  \
    date                task_type                                                   
    2017-04-03 15:30:00 c         0 days 00:00:00 0 days 00:00:00 0 days 00:00:00   
                        e         0 days 10:30:00 0 days 10:30:00 0 days 00:00:00   
    2017-04-03 19:00:00 d         1 days 00:30:00 0 days 00:00:00 0 days 10:30:00   
    2017-04-04 05:30:00 a         0 days 00:00:00 0 days 03:30:00 0 days 00:00:00   
    2017-04-04 19:30:00 a         0 days 00:00:00 0 days 21:00:00 0 days 00:00:00   
    2017-04-04 23:00:00 a         0 days 00:00:00 0 days 00:00:00 0 days 03:30:00   
                        c         0 days 00:00:00 0 days 00:00:00 0 days 17:30:00   
                        d         0 days 00:00:00 0 days 17:30:00 0 days 00:00:00   
    TOTAL                         1 days 11:00:00 2 days 04:30:00 1 days 07:30:00   
    
    calendar                                TOTAL  
    date                task_type                  
    2017-04-03 15:30:00 c         0 days 00:00:00  
                        e         0 days 21:00:00  
    2017-04-03 19:00:00 d         1 days 11:00:00  
    2017-04-04 05:30:00 a         0 days 03:30:00  
    2017-04-04 19:30:00 a         0 days 21:00:00  
    2017-04-04 23:00:00 a         0 days 03:30:00  
                        c         0 days 17:30:00  
                        d         0 days 17:30:00  
    TOTAL                         4 days 23:00:00 
    

    【讨论】:

    • 我选择了applymap 选项。
    猜你喜欢
    • 2019-03-03
    • 1970-01-01
    • 2022-08-03
    • 1970-01-01
    • 2016-10-12
    • 2021-02-28
    • 2023-01-11
    • 1970-01-01
    相关资源
    最近更新 更多