【问题标题】:Union of multiple ranges多个范围的并集
【发布时间】:2013-03-07 14:25:23
【问题描述】:

我有这些范围:

7,10
11,13
11,15
14,20
23,39

我需要执行重叠范围的并集以给出不重叠的范围,因此在示例中:

7,20
23,39

我已经在 Ruby 中完成了这项工作,我在数组中推送了范围的开始和结束并对它们进行排序,然后执行重叠范围的联合。在 Python 中有什么快速的方法吗?

【问题讨论】:

  • 我发布了一个无趣的答案。除了我的解决方案会产生正确的结果,即使起始数据中有例如 6,11,而 eumiro 的解决方案在这种情况下会产生错误的结果。但这种情况可能不太可能在您的数据中发生。
  • 您是否假设只有整数是有效输入? 10.5 不包含在输入范围内,但包含在输出范围内。即使使用整数,您是否假设封闭范围而不是 Python 的标准半开放范围? x[7:10] 和 x[11:13] 的并集是 x[7], x[8], x[9], x[11], x[12]。它不包括 x[10]。

标签: python range union


【解决方案1】:

假设(7, 10)(11, 13) 生成(7, 13)

a = [(7, 10), (11, 13), (11, 15), (14, 20), (23, 39)]
b = []
for begin,end in sorted(a):
    if b and b[-1][1] >= begin - 1:
        b[-1] = (b[-1][0], end)
    else:
        b.append((begin, end))

b 现在是

[(7, 20), (23, 39)]

编辑

@CentAu 正确地注意到,[(2,4), (1,6)] 将返回 (1,4) 而不是 (1,6)。这是正确处理这种情况的新版本:

a = [(7, 10), (11, 13), (11, 15), (14, 20), (23, 39)]
b = []
for begin,end in sorted(a):
    if b and b[-1][1] >= begin - 1:
        b[-1][1] = max(b[-1][1], end)
    else:
        b.append([begin, end])

【讨论】:

  • (+1) 我认为这写成生成器可能看起来不错(也许)。
  • 为了将来参考,这不适用于包含较小范围的范围。例如[(2,4),(1,6)] 的结果将是 [(1, 4)] 而不是 [(1, 6)]
  • 这一行是错误的:b[-1][1] = max(b[-1][1], end)。它导致TypeError: 'tuple' object does not support item assignment。您需要使用可变列表b.append([begin, end])
  • 试试这个a = ([[7, 10], [7, 13], [11, 15], [6, 19], [6, 7], [14, 20], [23, 39], [40, 4], [200, 1] ]),它会吐出[[6, 20], [23, 39], [200, 1]]。难道它不应该永远带回[200,1]吗?
【解决方案2】:

老问题。但我想添加这个答案以供将来参考。 sympy 可用于实现区间的并集:

from sympy import Interval, Union
def union(data):
    """ Union of a list of intervals e.g. [(1,2),(3,4)] """
    intervals = [Interval(begin, end) for (begin, end) in data]
    u = Union(*intervals)
    return [list(u.args[:2])] if isinstance(u, Interval) \
       else list(u.args)

如果Union 的输出超过两个间隔,则为Union 对象,而当有单个间隔时,输出为Interval 对象。这就是返回行中if statement 的原因。

例子:

In [26]: union([(10, 12), (14, 16), (15, 22)])
Out[26]: [[10, 12], [14, 22]]

In [27]: union([(10, 12), (9, 16)])
Out[27]: [[9, 16]]

【讨论】:

  • 这个答案有几个问题。如果 Union 是一个 Interval 列表,则返回一个 sympy.Interval 的 Python 列表,如果 Union 是单个 Interval,则该列表与您返回的不匹配。此外,鉴于“sympy 表达式不会转换为 ”是最常见的 [sympy] 标记问题之一,您应该警告人们他们经常需要显式转换回Python 原生类型使用 int(...)、float(...) 和 complex(...)。
【解决方案3】:

我尝试了存在 (45, 46) 和 (45, 45) 的特定情况
以及在您的应用程序中不太可能发生的测试用例:存在 (11,6)、存在 (-1, -5)、存在 (-9, 5)、存在 (-3, 10)。 无论如何,所有这些情况的结果都是正确的,这是一个重点。

算法:

def yi(li):
    gen = (x for a,b in li for x in xrange(a,b+1))
    start = p = gen.next()
    for x in gen:
        if x>p+2:
            yield (start,p)
            start = p = x
        else:
            p = x
    yield (start,x)

如果以下代码中的aff设置为True,则显示执行的步骤。

def yi(li):
    aff = 0
    gen = (x for a,b in li for x in xrange(a,b+1))
    start = p = gen.next()
    for x in gen:
        if aff:
            print ('start %s     p %d  p+2 %d     '
                   'x==%s' % (start,p,p+2,x))
        if x>p+2:
            if aff:
                print 'yield range(%d,%d)' % (start,p+1)
            yield (start,p)
            start = p = x
        else:
            p = x
    if aff:
        print 'yield range(%d,%d)' % (start,x+1)
    yield (start,x)



for li in ([(7,10),(23,39),(11,13),(11,15),(14,20),(45,46)],
           [(7,10),(23,39),(11,13),(11,15),(14,20),(45,46),(45,45)],
           [(7,10),(23,39),(11,13),(11,15),(14,20),(45,45)],

           [(7,10),(23,39),(11,13),(11,6),(14,20),(45,46)], 
           #1 presence of (11, 6)
           [(7,10),(23,39),(11,13),(-1,-5),(14,20),(45,45)], 
           #2  presence of (-1,-5)
           [(7,10),(23,39),(11,13),(-9,-5),(14,20),(45,45)], 
           #3  presence of (-9, -5)
           [(7,10),(23,39),(11,13),(-3,10),(14,20),(45,45)]): 
           #4  presence of (-3, 10)

    li.sort()
    print 'sorted li    %s'%li
    print '\n'.join('  (%d,%d)   %r' % (a,b,range(a,b)) 
                     for a,b in li)
    print 'list(yi(li)) %s\n' % list(yi(li))

结果

sorted li    [(7, 10), (11, 13), (11, 15), (14, 20),
              (23, 39), (45, 46)]
  (7,10)   [7, 8, 9]
  (11,13)   [11, 12]
  (11,15)   [11, 12, 13, 14]
  (14,20)   [14, 15, 16, 17, 18, 19]
  (23,39)   [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 
             35, 36, 37, 38]
  (45,46)   [45]
list(yi(li)) [(7, 20), (23, 39), (45, 46)]

sorted li    [(7, 10), (11, 13), (11, 15), (14, 20), 
              (23, 39), (45, 45), (45, 46)]
  (7,10)   [7, 8, 9]
  (11,13)   [11, 12]
  (11,15)   [11, 12, 13, 14]
  (14,20)   [14, 15, 16, 17, 18, 19]
  (23,39)   [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
             35, 36, 37, 38]
  (45,45)   []
  (45,46)   [45]
list(yi(li)) [(7, 20), (23, 39), (45, 46)]

sorted li    [(7, 10), (11, 13), (11, 15), (14, 20), 
              (23, 39), (45, 45)]
  (7,10)   [7, 8, 9]
  (11,13)   [11, 12]
  (11,15)   [11, 12, 13, 14]
  (14,20)   [14, 15, 16, 17, 18, 19]
  (23,39)   [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
             35, 36, 37, 38]
  (45,45)   []
list(yi(li)) [(7, 20), (23, 39), (45, 45)]

sorted li    [(7, 10), (11, 6), (11, 13), (14, 20), 
              (23, 39), (45, 46)]
  (7,10)   [7, 8, 9]
  (11,6)   []
  (11,13)   [11, 12]
  (14,20)   [14, 15, 16, 17, 18, 19]
  (23,39)   [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 
             35, 36, 37, 38]
  (45,46)   [45]
list(yi(li)) [(7, 20), (23, 39), (45, 46)]

sorted li    [(-1, -5), (7, 10), (11, 13), (14, 20), 
              (23, 39), (45, 45)]
  (-1,-5)   []
  (7,10)   [7, 8, 9]
  (11,13)   [11, 12]
  (14,20)   [14, 15, 16, 17, 18, 19]
  (23,39)   [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
             35, 36, 37, 38]
  (45,45)   []
list(yi(li)) [(7, 20), (23, 39), (45, 45)]

sorted li    [(-9, -5), (7, 10), (11, 13), (14, 20), 
              (23, 39), (45, 45)]
  (-9,-5)   [-9, -8, -7, -6]
  (7,10)   [7, 8, 9]
  (11,13)   [11, 12]
  (14,20)   [14, 15, 16, 17, 18, 19]
  (23,39)   [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
             35, 36, 37, 38]
  (45,45)   []
list(yi(li)) [(-9, -5), (7, 20), (23, 39), (45, 45)]

sorted li    [(-3, 10), (7, 10), (11, 13), (14, 20), 
              (23, 39), (45, 45)]
  (-3,10)   [-3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
  (7,10)   [7, 8, 9]
  (11,13)   [11, 12]
  (14,20)   [14, 15, 16, 17, 18, 19]
  (23,39)   [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 
             35, 36, 37, 38]
  (45,45)   []
list(yi(li)) [(-3, 20), (23, 39), (45, 45)]

【讨论】:

    【解决方案4】:

    以下函数适用于给定的示例数据:

    def range_overlap_adjust(list_ranges):
        overlap_corrected = []
        for start, stop in sorted(list_ranges):
            if overlap_corrected and start-1 <= overlap_corrected[-1][1] and stop >= overlap_corrected[-1][1]:
                overlap_corrected[-1] = min(overlap_corrected[-1][0], start), stop
            elif overlap_corrected and start <= overlap_corrected[-1][1] and stop <= overlap_corrected[-1][1]:
                break
            else:
                overlap_corrected.append((start, stop))
        return overlap_corrected
    

    用法

    list_ranges = [(7, 10), (11, 13), (11, 15), (14, 20), (23, 39)]   
    print(range_overlap_adjust(list_ranges))
    # prints [(7, 20), (23, 39)]
    

    【讨论】:

      【解决方案5】:

      这是一个使用 functools.reduce 的单行代码(假设 (x, 10) 和 (11, y) 重叠):

      reduce(
          lambda acc, el: acc[:-1:] + [(min(*acc[-1], *el), max(*acc[-1], *el))]
              if acc[-1][1] >= el[0] - 1
              else acc + [el],
          ranges[1::],
          ranges[0:1]
      )
      

      这从第一个范围开始,并使用reduce 遍历其余范围。它将最后一个元素 (acc[-1]) 与下一个范围 (el) 进行比较。如果它们重叠,它将用两个范围的最小值和最大值替换最后一个元素 (acc[:-1:] + [min, max])。如果它们不重叠,它只是将这个新范围放在列表的末尾 (acc + [el])。

      例子:

      from functools import reduce
      
      example_ranges = [(7, 10), (11, 13), (11, 15), (14, 20), (23, 39)]
      
      def combine_overlaps(ranges):
          return reduce(
              lambda acc, el: acc[:-1:] + [(min(*acc[-1], *el), max(*acc[-1], *el))]
                  if acc[-1][1] >= el[0] - 1
                  else acc + [el],
              ranges[1::],
              ranges[0:1],
          )
      
      print(combine_overlaps(example_ranges))
      

      输出:

      [(7, 20), (23, 39)]
      

      【讨论】:

      • 这个答案似乎假设输入列表按特定顺序排列。例如:print(combine_overlaps([(7,10), (23,39), (9,20)]))。这将打印 [(7, 10), (9, 39)] 而不是 [(7,20), (23,39)]
      猜你喜欢
      • 2017-09-24
      • 2021-02-24
      • 2015-01-23
      • 2019-04-18
      • 2017-08-20
      • 1970-01-01
      • 1970-01-01
      • 2022-11-12
      • 2013-07-26
      相关资源
      最近更新 更多