【问题标题】:Python Pandas: How do I perform an operation on a shifted column within a group?Python Pandas:如何对组内的移位列执行操作?
【发布时间】:2021-09-17 10:18:20
【问题描述】:

我有一个数据框,我希望在其中对组内的移位列执行时差操作以获取工作时间。例如看下面的数据:

 driver_id    veh                starttime                stoptime
0  kg123     10010      2018-12-21 15:17:29    2018-12-21 15:18:57
1  kg124     10012      2019-01-01 00:10:16    2019-01-01 00:16:32
2  kg124     10012      2019-01-01 00:27:11    2019-01-01 00:31:38
3  kg214     10012      2019-01-01 00:46:20    2019-01-01 01:04:54
4  kg125     10013      2019-01-01 00:19:06    2019-01-01 00:39:43

我希望添加一列,从当前停止时间中减去同一车辆中驾驶员的下一次开始时间,以便识别任务之间的休息时间。但我想将操作保留在我选择的一组中,在本例中为 driver_id 和车辆。输出应如下所示:

 driver-id  veh         starttime              stoptime      break_from_last
0  kg123   10010   2018-12-21 15:17:29 2018-12-21 15:18:57               NaT
1  kg124   10012   2019-01-01 00:10:16 2019-01-01 00:16:32               NaT
2  kg124   10012   2019-01-01 00:27:11 2019-01-01 00:31:38   0 days 00:21:22
3  kg124   10012   2019-01-01 00:46:20 2019-01-01 01:04:54   0 days 00:37:43
4  kg125   10013   2019-01-01 00:19:06 2019-01-01 00:39:43               NaT

在 R 中,这很简单,如下所示,使用 data.table:

 #starting shift

      j = c("driver_id","veh")
      df[,break_from_last:= round(
        as.numeric(difftime(starttime, shift(stoptime, 1L, type = "lag"),units ="hours"))
        ,2),by = j]

我如何在 python 中实现这一点?我可以产生变化的差异,我只需要添加组。见下文:

#produce a break
#BUT HOW DO I ADD A GROUP DESIGNATION?
df['break_from_last'] = df['stoptime'] - df['starttime'].shift(1)  

【问题讨论】:

    标签: python pandas dataframe datetime time-series


    【解决方案1】:

    试试这个,在 starttime 列上做一个 groupy 和 shift,让 pandas 使用索引的内在数据对齐来处理数学运算:

    df['break_from_last'] = df['stoptime'] - df.groupby('driver_id')['starttime'].shift()
    df
    

    输出:

      driver_id    veh           starttime            stoptime break_from_last
    0     kg123  10010 2018-12-21 15:17:29 2018-12-21 15:18:57             NaT
    1     kg124  10012 2019-01-01 00:10:16 2019-01-01 00:16:32             NaT
    2     kg124  10012 2019-01-01 00:27:11 2019-01-01 00:31:38 0 days 00:21:22
    3     kg124  10012 2019-01-01 00:46:20 2019-01-01 01:04:54 0 days 00:37:43
    4     kg125  10013 2019-01-01 00:19:06 2019-01-01 00:39:43             NaT
    

    添加veh

    df['break_from_last'] = df['stoptime'] - df.groupby(['driver_id', 'veh'])['starttime'].shift()
    

    输出:

      driver_id    veh           starttime            stoptime break_from_last
    0     kg123  10010 2018-12-21 15:17:29 2018-12-21 15:18:57             NaT
    1     kg124  10012 2019-01-01 00:10:16 2019-01-01 00:16:32             NaT
    2     kg124  10012 2019-01-01 00:27:11 2019-01-01 00:31:38 0 days 00:21:22
    3     kg214  10012 2019-01-01 00:46:20 2019-01-01 01:04:54             NaT
    4     kg125  10013 2019-01-01 00:19:06 2019-01-01 00:39:43             NaT
    

    【讨论】:

    • 谢谢,这似乎工作得很好!我还添加了另一列:df['break_from_last'] = df['stoptime'] - df.groupby(['driver_id', 'veh'])['starttime'].shift()
    猜你喜欢
    • 2018-11-28
    • 2021-03-20
    • 2017-03-17
    • 2019-09-21
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-04-11
    相关资源
    最近更新 更多