【问题标题】:How to find all time periods in which there is overlap in a Python DataFrame?如何找到 Python DataFrame 中存在重叠的所有时间段?
【发布时间】:2023-01-30 21:57:48
【问题描述】:

我的 df 如下:

df = pd.DataFrame({'Name':['Anne','Anne','Anne','Anne','Anne','Anne','Anne','Anne','Anne','Anne','Anne','Anne',
                           'Bob','Bob','Bob','Bob','Bob','Bob','Bob','Bob','Bob','Bob','Bob','Bob'],
               
               "start":["2019-01-01", "2019-02-01", "2019-03-01", "2019-04-01", "2019-05-01", "2019-06-01", "2019-07-01", "2019-08-01", "2019-09-01", "2019-10-01", "2019-11-01", "2019-12-01",  
                        "2019-01-01", "2019-02-01", "2019-03-01", "2019-04-01", "2019-05-01", "2019-06-01", "2019-07-01", "2019-08-01", "2019-09-01", "2019-10-01", "2019-11-01", "2019-12-01"],
               
                 "end":["2019-01-31", "2019-02-28", "2019-03-31", "2019-04-30", "2019-05-31", "2019-06-30", "2019-07-31", "2019-08-31", "2019-09-30", "2019-10-31", "2019-11-30", "2019-12-31",
                        "2019-01-31", "2019-02-28", "2019-03-31", "2019-04-30", "2019-05-31", "2019-06-30", "2019-07-31", "2019-08-31", "2019-09-30", "2019-10-31", "2019-11-30", "2019-12-31"],
                 
                "percentage":[1/12, 1/12, 1/12, 1/12, 1/12, 1/12, 1/12, 1/12, 1/12, 1/12, 1/12, 1/12,
                              1/12, 1/12, 1/12, 1/12, 1/12, 1/12, 1/12, 1/12, 1/12, 1/12, 1/12, 1/12]})

# insert "wrong" row
df.loc[len(df.index)] = ['Anne', "2019-01-15", "2019-02-15", 1/12] 

df.start = df.start.apply(pd.to_datetime, format="%Y-%m-%d")
df.end   = df.end.apply(pd.to_datetime, format="%Y-%m-%d")

我现在想找到同一用户的所有行,其中有一个重叠期。在我上面的代码示例中,只有一处重叠。重叠部分是针对安妮的:

  • 2019-01-01 至 2019-01-31
  • 2019-02-01 至 2019-02-31
  • 2019-01-15 至 2019-02-15

如何返回每个用户重叠的行的索引?

【问题讨论】:

    标签: python pandas dataframe datetime


    【解决方案1】:

    利用:

    df1 = df.loc[df.index.repeat(df.end.sub(df.start).dt.days + 1)].copy()
    df1['start'] += pd.to_timedelta(df1.groupby(level=0).cumcount(), 'd')
    
    df1 = df[df1.duplicated(['Name','start'], keep=False).groupby(level=0).any()]
    print (df1)
        Name      start        end percentage
    0   Anne 2019-01-01 2019-01-31       1/12
    1   Anne 2019-02-01 2019-02-28       1/12
    24  Anne 2019-01-15 2019-02-15       1/12
    

    【讨论】:

      猜你喜欢
      • 2015-07-17
      • 2020-02-28
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-04-13
      • 1970-01-01
      • 2014-09-04
      相关资源
      最近更新 更多