【发布时间】:2021-06-23 16:09:29
【问题描述】:
我只是在寻找一种更直观、更快捷的方法来获取不间断时间序列的开始和结束时间。这是一个可重现的示例以及我暂时的做法:
import pandas as pd
import numpy as np
import datetime
data = ['1999-01-01 00:00:00', '1999-01-01 01:00:00', '1999-01-01 02:00:00',
'1999-01-10 10:00:00', '1999-01-10 11:00:00', '1999-01-10 12:00:00', '1999-01-10 13:00:00',
'1999-01-20 17:00:00', '1999-01-20 18:00:00', '1999-01-20 19:00:00']
df = pd.DataFrame(data, columns = ['time'])
df['time'] = pd.to_datetime(df['time'])
# Conversion:
new_df = pd.DataFrame(columns=['Start Date', 'End Date'])
new_df2 = pd.DataFrame(columns=['End Date'])
df['diff'] = df['time'].diff(1)
df['diff2'] = df['diff'].shift(-1)
new_df['Start Date'] = df['time'].loc[df['diff'] != pd.to_timedelta(1, unit ='h')].reset_index(drop = True)
new_df2['End Date'] = df['time'].loc[df['diff2'] != pd.to_timedelta(1, unit ='h')].reset_index(drop = True)
new_df['End Date'] = new_df2['End Date']
new_df['Duration [Hours]'] = (new_df['End Date'] - new_df['Start Date']) / np.timedelta64(1, 'h')
print(new_df)
结果数据框:
Start Date End Date Duration [Hours]
0 1999-01-01 00:00:00 1999-01-01 02:00:00 2.0
1 1999-01-10 10:00:00 1999-01-10 13:00:00 3.0
2 1999-01-20 17:00:00 1999-01-20 19:00:00 2.0
任何形式的帮助都是有价值的。
【问题讨论】:
标签: python pandas time-series