【问题标题】:Get the overlap duration between date intervals based on condition根据条件获取日期间隔之间的重叠持续时间
【发布时间】:2021-02-25 17:18:30
【问题描述】:

我有两个数据框,它们有一个开始/结束日期时间和一个值。行数不一样。重叠的间隔可能不在同一行/索引中。

df1

start_datetime   end_datetime   value
08:50            09:50          5
09:52            10:10          6
10:50            11:30          2

df2

start_datetime   end_datetime   value
08:51            08:59          3
09:52            10:02          9
10:03            10:30          1
11:03            11:39          1
13:10            13:15          0

我想计算 df1 和 df2 仅在df1.value > df2.value 重叠时的持续时间总和。 在一个df2时间间隔内,df1可以重叠多次,有时条件为真。

我尝试过类似的方法:

            time = timedelta()
            for i, row1 in df1.iterrows():
                t1 = pd.Interval(row1.start, row1.end)
                for j, row2 in df2.iterrows():
                    t2 = pd.Interval(row2.start, row2.end)
                    if t1.overlaps(t2) and row1.value > row2.value:
                        latest_start = np.maximum(row1.start, row1.start)
                        earliest_end = np.minimum(row2.end, row2.end)
                        delta = earliest_end - latest_start
                        time += delta

我可以循环每个 df1 行并使用整个 df2 数据进行测试,但它没有经过优化。

预期输出(示例):

Timedelta('0 days 00:99:99')

【问题讨论】:

    标签: python pandas overlap


    【解决方案1】:

    这是我的解决方案:

    1. 创建数据帧:
    df1 = pd.DataFrame(
        {"start_datetime1": ['08:50' ,'09:52' ,'10:50 ' ],  
         'end_datetime1' : ['09:50','10:10','11:30'] , 
         'value1': [5,6,2]})
    
    df2 = pd.DataFrame(
          {"start_datetime2": ['08:51' ,'09:52' ,'10:03 ','11:03 ','13:10 ' ], 
           'end_datetime2' : ['08:59','10:02','10:30','11:39', '13:15'] ,
           'value2': [3,9,1,1,0]})
    
    df2["start_datetime2"]= pd.to_datetime(df2["start_datetime2"])
    df2["end_datetime2"]= pd.to_datetime(df2["end_datetime2"])
    
    df1["start_datetime1"]= pd.to_datetime(df1["start_datetime1"])
    df1["end_datetime1"]= pd.to_datetime(df1["end_datetime1"])
    
    1. 结合数据框使比较更容易。组合数据框具有所有可能的匹配项:
    df1['temp'] = 1 #temporary keys to create all combinations
    df2['temp'] = 1
    df_combined = pd.merge(df1,df2,on='temp').drop('temp',axis=1)
    
    1. 用 lambda 函数比较值:
    df_combined['Result'] = df_combined.apply(lambda row: max(row["start_datetime1"],row["start_datetime2"]) -
                                             min(row["start_datetime1"],row["start_datetime2"]) 
                                             if pd.Interval(row['start_datetime1'], row['end_datetime1']).overlaps(
                                                 pd.Interval(row['start_datetime2'], row['end_datetime2'])) and
                                                 row["value1"] > row["value2"] 
                                                 else 0, axis = 1 )
    df_combined
    

    结果:

    total_timedelta = df_combined['Result'].loc[df_combined['Result'] != 0].sum()
    0 days 00:25:00
    

    数据框:

    【讨论】:

      猜你喜欢
      • 2016-04-12
      • 1970-01-01
      • 1970-01-01
      • 2020-02-19
      • 2021-08-03
      • 2018-05-24
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多