【问题标题】:Create IntervalIndex from periodic events in a pandas dataframe从熊猫数据框中的周期性事件创建 IntervalIndex
【发布时间】:2022-12-17 14:15:16
【问题描述】:

我有一个看起来像这样的数据框:

duration,window_start,window_end,REPETITIONS
0 days 01:00:00,2023-12-31,2024-01-07,5
0 days 00:30:00,2021-10-28,2021-11-02,10
0 days 00:20:00,2022-12-24,2023-01-04,15
0 days 01:00:00,2023-06-15,2023-06-17,20

我想将这些周期性事件提取到一个数据框中,该数据框包含基于重复次数以及 window_start 和 window_end 的开始时间和结束时间。在上面的例子中应该有 5+10+15+20=50 个离散事件。我正在努力对这种转换进行矢量化,并且看不到循环遍历每一行的方法。

到目前为止我得到了什么:

import pandas as pd
import numpy as np

periodic = pd.read_csv("events.csv",header=0,parse_dates=["start_date", "end_date"], index_col="id")
 
start = periodic.apply(lambda row: np.linspace(row["window_start"].value, row["window_end"].value, row["REPETITIONS"]), axis=1)
start = start.apply(lambda row: pd.to_datetime(row))
end = start + periodic["duration"]

它给出了两个独立的系列;开始结尾包含每个 DateTimeIndexID在系列中,即:

start.head()

1,"DatetimeIndex([          '2021-12-31 00:00:00',
               '2022-01-01 00:01:00',
               '2021-01-01 00:02:00',
               '2021-01-01 00:03:00',

end.head()

1,"DatetimeIndex([          '2021-12-31 01:00:00',
               '2022-01-01 00:02:00',
               '2021-01-01 00:03:00',
               '2021-01-01 00:04:00',

目标是获得如下所示的结果:

id, start, end
1,'2021-12-31 00:00:00','2021-12-31 00:01:00'
1,'2021-12-31 00:00:00','2021-12-31 00:01:00'
1,'2021-12-31 00:00:00','2021-12-31 00:01:00'
.
.
.
2,'2021-10-28 00:00:00','2021-10-28 00:30:00'
2,'2021-10-28 13:20:00','2021-10-28 13:50:00'

【问题讨论】:

    标签: python pandas dataframe numpy vectorization


    【解决方案1】:

    你试过这样的事情吗?

    df['duration'] = pd.to_timedelta(df['duration'])
    ef = pd.DataFrame() # new df
    
    # loop through
    for i, row in df.iterrows():
        # date range for the given window start and end dates with duration as frequency
        dates = pd.date_range(row['window_start'], row['window_end'], freq=row['duration'])
        event_df = pd.DataFrame({'start': dates, 'end': dates + row['duration'], 'id': i+1})
        # append
        ef = ef.append(event_df)
    
    # resample dataframe by id
    result = ef.set_index('start').resample('D')['id'].count()
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2012-05-19
      • 2022-10-24
      • 1970-01-01
      • 2021-06-02
      • 2014-11-22
      • 1970-01-01
      • 2022-09-24
      • 1970-01-01
      相关资源
      最近更新 更多