【问题标题】:Calculate Time Difference Between Two Pandas Columns in Hours and Minutes以小时和分钟计算两个 Pandas 列之间的时间差
【发布时间】:2014-05-20 08:47:40
【问题描述】:

我在一个数据框中有两列,fromdatetodate

import pandas as pd

data = {'todate': [pd.Timestamp('2014-01-24 13:03:12.050000'), pd.Timestamp('2014-01-27 11:57:18.240000'), pd.Timestamp('2014-01-23 10:07:47.660000')],
        'fromdate': [pd.Timestamp('2014-01-26 23:41:21.870000'), pd.Timestamp('2014-01-27 15:38:22.540000'), pd.Timestamp('2014-01-23 18:50:41.420000')]}

df = pd.DataFrame(data)

我添加了一个新列 diff,以使用来查找两个日期之间的差异

df['diff'] = df['fromdate'] - df['todate']

我得到diff 列,但它包含days,超过24 小时。

                   todate                fromdate                   diff
0 2014-01-24 13:03:12.050 2014-01-26 23:41:21.870 2 days 10:38:09.820000
1 2014-01-27 11:57:18.240 2014-01-27 15:38:22.540 0 days 03:41:04.300000
2 2014-01-23 10:07:47.660 2014-01-23 18:50:41.420 0 days 08:42:53.760000

如何将我的结果转换为小时和分钟(即将天转换为小时)?

【问题讨论】:

    标签: python pandas datetime python-datetime


    【解决方案1】:

    Pandas 时间戳差异返回一个 datetime.timedelta 对象。这可以通过使用 *as_type* 方法轻松转换为小时,就像这样

    import pandas
    df = pandas.DataFrame(columns=['to','fr','ans'])
    df.to = [pandas.Timestamp('2014-01-24 13:03:12.050000'), pandas.Timestamp('2014-01-27 11:57:18.240000'), pandas.Timestamp('2014-01-23 10:07:47.660000')]
    df.fr = [pandas.Timestamp('2014-01-26 23:41:21.870000'), pandas.Timestamp('2014-01-27 15:38:22.540000'), pandas.Timestamp('2014-01-23 18:50:41.420000')]
    (df.fr-df.to).astype('timedelta64[h]')
    

    屈服,

    0    58
    1     3
    2     8
    dtype: float64
    

    【讨论】:

    • astype 解决方法有效,但对于大型(50 万行)文件来说太慢了。还有其他建议吗?
    【解决方案2】:

    这让我发疯了,因为上面的 .astype() 解决方案对我不起作用。但我找到了另一种方法。还没有计时或其他任何东西,但可能对其他人有用:

    t1 = pd.to_datetime('1/1/2015 01:00')
    t2 = pd.to_datetime('1/1/2015 03:30')
    
    print pd.Timedelta(t2 - t1).seconds / 3600.0
    

    ...如果你想要几个小时。或者:

    print pd.Timedelta(t2 - t1).seconds / 60.0
    

    ...如果你想要几分钟。

    更新:这里曾经有一条有用的评论提到使用.total_seconds() 跨越多天的时间段。既然它消失了,我已经更新了答案。

    【讨论】:

      【解决方案3】:
      import pandas as pd
      
      # test data from OP, with values already in a datetime format
      data = {'to_date': [pd.Timestamp('2014-01-24 13:03:12.050000'), pd.Timestamp('2014-01-27 11:57:18.240000'), pd.Timestamp('2014-01-23 10:07:47.660000')],
              'from_date': [pd.Timestamp('2014-01-26 23:41:21.870000'), pd.Timestamp('2014-01-27 15:38:22.540000'), pd.Timestamp('2014-01-23 18:50:41.420000')]}
      
      # test dataframe; the columns must be in a datetime format; use pandas.to_datetime if needed
      df = pd.DataFrame(data)
      
      # add a timedelta column if wanted. It's added here for information only
      # df['time_delta_with_sub'] = df.from_date.sub(df.to_date)  # also works
      df['time_delta'] = (df.from_date - df.to_date)
      
      # create a column with timedelta as total hours, as a float type
      df['tot_hour_diff'] = (df.from_date - df.to_date) / pd.Timedelta(hours=1)
      
      # create a colume with timedelta as total minutes, as a float type
      df['tot_mins_diff'] = (df.from_date - df.to_date) / pd.Timedelta(minutes=1)
      
      # display(df)
                        to_date               from_date             time_delta  tot_hour_diff  tot_mins_diff
      0 2014-01-24 13:03:12.050 2014-01-26 23:41:21.870 2 days 10:38:09.820000      58.636061    3518.163667
      1 2014-01-27 11:57:18.240 2014-01-27 15:38:22.540 0 days 03:41:04.300000       3.684528     221.071667
      2 2014-01-23 10:07:47.660 2014-01-23 18:50:41.420 0 days 08:42:53.760000       8.714933     522.896000
      

      其他方法

      • 其他资源播客中的一条注释.total_seconds() 在核心开发人员休假时添加并合并,不会被批准。
        • 这也是为什么没有其他 .total_xx 方法的原因。
      # convert the entire timedelta to seconds
      # this is the same as td / timedelta(seconds=1)
      (df.from_date - df.to_date).dt.total_seconds()
      [out]:
      0    211089.82
      1     13264.30
      2     31373.76
      dtype: float64
      
      # get the number of days
      (df.from_date - df.to_date).dt.days
      [out]:
      0    2
      1    0
      2    0
      dtype: int64
      
      # get the seconds for hours + minutes + seconds, but not days
      # note the difference from total_seconds
      (df.from_date - df.to_date).dt.seconds
      [out]:
      0    38289
      1    13264
      2    31373
      dtype: int64
      

      其他资源

      %%timeit测试

      import pandas as pd
      
      # dataframe with 2M rows
      data = {'to_date': [pd.Timestamp('2014-01-24 13:03:12.050000'), pd.Timestamp('2014-01-27 11:57:18.240000')], 'from_date': [pd.Timestamp('2014-01-26 23:41:21.870000'), pd.Timestamp('2014-01-27 15:38:22.540000')]}
      df = pd.DataFrame(data)
      df = pd.concat([df] * 1000000).reset_index(drop=True)
      
      %%timeit
      (df.from_date - df.to_date) / pd.Timedelta(hours=1)
      [out]:
      43.1 ms ± 1.05 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
      
      %%timeit
      (df.from_date - df.to_date).astype('timedelta64[h]')
      [out]:
      59.8 ms ± 1.29 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2012-10-26
        • 1970-01-01
        • 2016-11-07
        • 2011-11-29
        • 1970-01-01
        • 2020-01-02
        相关资源
        最近更新 更多