使用 datetime pandas 根据持续时间创建行答案

【问题标题】：Creating rows as per duration using datetime pandas使用 datetime pandas 根据持续时间创建行
【发布时间】：2019-04-21 12:13:17
【问题描述】：

我在使用日期时间编写代码时遇到问题。我已经创建了一个我正在处理的场景。有人可以帮我写代码吗？

输入：

Name, Channel, Duration, Start_time
John, A, 2, 16:00:00
Joseph, B, 3, 15:05:00

输出：

Name, Channel, Duration, Start_time
John, A, 2, 16:00:00
John, A, 2, 16:01:00
Joseph, B, 3, 15:05:00
Joseph, B, 3, 15:06:00
Joseph, B, 3, 15:07:00

提前谢谢你。

【问题讨论】：

标签： python pandas datetime timedelta

【解决方案1】：

用途：

df['Start_time'] = pd.to_timedelta(df['Start_time'])
df = df.loc[df.index.repeat(df['Duration'])]
td = pd.to_timedelta(df.groupby(level=0).cumcount() * 60, unit='s')

df['Start_time'] = df['Start_time'] + td
df = df.reset_index(drop=True)

print (df)
     Name Channel  Duration Start_time
0    John       A         2   16:00:00
1    John       A         2   16:01:00
2  Joseph       B         3   15:05:00
3  Joseph       B         3   15:06:00
4  Joseph       B         3   15:07:00

解释：

先转换列Start_timeto_timedelta
然后repeat 的索引值按列Duration 并按loc 重复行
通过cumcount 每个索引值创建计数器并将其转换为 1 分钟时间增量，添加到新的重复列 Start_time
最后一个 reset_index 和参数 drop=True 以避免重复的索引值

编辑：

如果希望输出解决方案中的日期时间相同，只需先转换值to_datetime:

df['Start_time'] = pd.to_datetime(df['Start_time'])
df = df.loc[df.index.repeat(df['Duration'])]
td = pd.to_timedelta(df.groupby(level=0).cumcount() * 60, unit='s')

df['Start_time'] = df['Start_time'] + td
df = df.reset_index(drop=True)
print (df)
     Name Channel  Duration          Start_time
0    John       A         2 2018-11-19 16:00:00
1    John       A         2 2018-11-19 16:01:00
2  Joseph       B         3 2018-11-19 15:05:00
3  Joseph       B         3 2018-11-19 15:06:00
4  Joseph       B         3 2018-11-19 15:07:00

【讨论】：

@SrikanthAyithy - 欢迎您！如果我的回答有帮助，请不要忘记accept。谢谢。
非常感谢您之前的回答。需要更多的小帮助。如果我必须将我提到的输入数据总结为半小时带。就像我的输出看起来一样，名称、频道、持续时间、时间段、持续时间、计数 John, A, 2, 16:00:00-16:30:00 1 Joseph, B, 3, 15:00:00-15： 30:00 1........ 当我们总结一行时长为 10 的观看时，我们可能会遇到一个问题，该行将在 15:00 中像 5 分钟那样在不同波段之间拆分： 00-15:30:00和15:30:00-16:00:00其他5分钟。
@SrikanthAyithy - 只有一个答案应该被接受，你能检查一下吗？
@SrikanthAyithy - 谢谢。对于您在 cmets 中的问题 - 您可以再创建一个问题吗？

【解决方案2】：

使用 -

df['dates'] = df.apply(lambda x: list(pd.date_range(start=x['Start_time'], periods=x['Duration'], freq='1min')), axis=1)
df.set_index(['Name','Channel','Duration', 'Start_time'])['dates'].apply(pd.Series).stack().reset_index().drop(['level_4','Start_time'],1).rename(columns={0:'Start_time'})

输出

    Name    Channel Duration    Start_time
0   John    A   3   2018-11-19 16:00:00
1   John    A   3   2018-11-19 16:01:00
2   John    A   3   2018-11-19 16:02:00
3   Joseph  B   4   2018-11-19 15:05:00
4   Joseph  B   4   2018-11-19 15:06:00
5   Joseph  B   4   2018-11-19 15:07:00
6   Joseph  B   4   2018-11-19 15:08:00

说明

将pd.date_range() 应用于Start_time 和Duration
使用第二行将其分解为df

【讨论】：

这也有效。但是，结果会排成一行。