【问题标题】:python pandas calculating hours for daterange in dataframepython pandas计算数据框中日期范围的小时数
【发布时间】:2017-04-29 09:26:16
【问题描述】:

我想计算日期范围的待命时间。标准待命时间为周一至周五每天 16 小时,周六和周日为 24 小时。

我已经编写了代码,适用于两个特定日期:

date1 = date(2017,4, 13)
date2 = date(2017,4, 17)

def daterange(d1, d2):
     return (d1 + datetime.timedelta(days=i) for i in range((d2 - d1).days + 1))

total = 0
for n in daterange(date1, date2):
    if n.weekday() < 5:
        total += 16
    else: 
        total += 24
print (total)

我在将这个实现到日期范围时遇到问题:

Start      End
2017-02-03 2017-03-15
2017-02-05 2017-03-16
2017-02-06 2017-03-17
2017-02-10 2017-03-18
...        ...

上面这些列的类型是 datetime64[ns]

错误是 TypeError: cannot convert the series to class 'int'

有什么方法可以为时间序列列计算这个吗?它可以在新列中或仅在结果中

提前感谢您!

【问题讨论】:

    标签: python python-3.x pandas date-range


    【解决方案1】:

    您可以使用apply 的自定义函数:

    df['new'] = df.apply(lambda x : np.where(pd.date_range(x['Start'], x['End']).weekday < 5, 16, 24).sum(), axis=1)
    print (df)
           Start        End  new
    0 2017-02-03 2017-03-15  752
    1 2017-02-05 2017-03-16  728
    2 2017-02-06 2017-03-17  720
    3 2017-02-10 2017-03-18  680
    

    同理:


    def f(x):
        b = pd.date_range(x['Start'], x['End']).weekday
        return np.where(b < 5, 16, 24).sum()
    
    df['new'] = df.apply(f, axis=1)
    print (df)
           Start        End  new
    0 2017-02-03 2017-03-15  752
    1 2017-02-05 2017-03-16  728
    2 2017-02-06 2017-03-17  720
    3 2017-02-10 2017-03-18  680
    

    另一种解决方案,但我认为它更复杂:

    #reshape df
    df1 = df.stack().reset_index()
    df1.columns = ['i','c','date']
    #groupby by index and resample to days, forward fill NaNs
    df1 = df1.set_index('date').groupby('i').resample('D').ffill()
             .reset_index(level=0, drop=True).reset_index()
    #get hours
    df1['tot'] = np.where(df1['date'].dt.weekday < 5, 16, 24)
    #sum by index
    s = df1.groupby('i')['tot'].sum()
    #join to original
    df = df.join(s)
    print (df.head(10))
           Start        End  tot
    0 2017-02-03 2017-03-15  752
    1 2017-02-05 2017-03-16  728
    2 2017-02-06 2017-03-17  720
    3 2017-02-10 2017-03-18  680
    

    时间安排

    df = pd.concat([df]*100).reset_index(drop=True) 
    print (df)
    
    def f(df):
        df1 = df.stack().reset_index()
        df1.columns = ['i','c','date']
        df1 = df1.set_index('date').groupby('i').resample('D').ffill().reset_index(level=0, drop=True).reset_index()
        df1['tot'] = np.where(df1['date'].dt.weekday < 5, 16, 24)
        s = df1.groupby('i')['tot'].sum()
        return df.join(s)
    
    print (f(df))
    mapping = {i:16 if i<5 else 24 for i in range(7)}
    
    In [190]: %timeit (f(df))
    1 loop, best of 3: 482 ms per loop
    
    #MaxU solution
    In [191]: %timeit df['oncall_hours'] =  df.apply(lambda x: pd.date_range(x['Start'], x['End']).to_series().dt.weekday.map(mapping).sum(), axis=1)
    1 loop, best of 3: 531 ms per loop
    
    In [192]: %timeit df['new'] = df.apply(lambda x : np.where(pd.date_range(x['Start'], x['End']).weekday < 5, 16, 24).sum(), axis=1)
    10 loops, best of 3: 166 ms per loop
    

    【讨论】:

      【解决方案2】:

      IIUC 你可以使用下面的简单映射:

      样本系列:

      In [110]: s = pd.date_range('2017-01-01', periods=10).to_series()
      
      In [111]: s
      Out[111]:
      2017-01-01   2017-01-01
      2017-01-02   2017-01-02
      2017-01-03   2017-01-03
      2017-01-04   2017-01-04
      2017-01-05   2017-01-05
      2017-01-06   2017-01-06
      2017-01-07   2017-01-07
      2017-01-08   2017-01-08
      2017-01-09   2017-01-09
      2017-01-10   2017-01-10
      Freq: D, dtype: datetime64[ns]
      

      映射

      # DateLikeSeries.dt.weekday returns the day of the week with Monday=0, Sunday=6
      In [94]: mapping = {i:16 if i<5 else 24 for i in range(7)}
      
      In [95]: mapping
      Out[95]: {0: 16, 1: 16, 2: 16, 3: 16, 4: 16, 5: 24, 6: 24}
      
      In [112]: s.dt.weekday.map(mapping)
      Out[112]:
      2017-01-01    24
      2017-01-02    16
      2017-01-03    16
      2017-01-04    16
      2017-01-05    16
      2017-01-06    16
      2017-01-07    24
      2017-01-08    24
      2017-01-09    16
      2017-01-10    16
      Freq: D, dtype: int64
      
      
      In [113]: s.dt.weekday.map(mapping).sum()
      Out[113]: 184
      

      您可以将此逻辑应用于您的 DataFrame:

      In [107]: df
      Out[107]:
             Start        End
      0 2017-02-03 2017-03-15
      1 2017-02-05 2017-03-16
      2 2017-02-06 2017-03-17
      3 2017-02-10 2017-03-18
      
      In [108]: %paste
      df['oncall_hours'] = \
          df.apply(lambda x: pd.date_range(x['Start'], x['End'])
                               .to_series()
                               .dt.weekday
                               .map(mapping)
                               .sum(),
                   axis=1)
      ## -- End pasted text --
      
      In [109]: df
      Out[109]:
             Start        End  oncall_hours
      0 2017-02-03 2017-03-15           752
      1 2017-02-05 2017-03-16           728
      2 2017-02-06 2017-03-17           720
      3 2017-02-10 2017-03-18           680
      

      【讨论】:

        【解决方案3】:

        您需要使用 apply 函数来执行此操作。该错误只是告诉您您没有正确调用该函数。

        在 pandas 中,apply 方法将一个函数应用于数据帧的每一行(逐行)

        将您的 pandas 数据框函数调用更改为:

        df['new_column'] = df.apply( lambda x : daterange(x['start'],x['end']))
        

        如果您需要进一步的帮助,请告诉我。

        【讨论】:

          猜你喜欢
          • 2021-05-09
          • 2022-07-22
          • 1970-01-01
          • 1970-01-01
          • 2017-05-16
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2020-11-02
          相关资源
          最近更新 更多