Python一次性完成搜索功能答案

【问题标题】：Python complete search in one pass functionPython一次性完成搜索功能
【发布时间】：2018-01-05 10:08:15
【问题描述】：

我正在编写一个程序，它获取农民挤奶的开始和结束时间列表，并确定 >=1 头奶牛挤奶的最长时间和没有奶牛挤奶的最长时间。

在其中，我尝试过使用此功能。这是一个完整搜索的练习，但是当有很多数据时这还不够快（我认为是因为有 n^2 次迭代）。

timesIS 只是一个按开始顺序递增的时间列表，timesDE 是一个按结束顺序递减的相同时间列表。 timeIndex 是开始的位置。对于最长的挤奶间隔，我的程序稍后会为每个索引执行此操作并返回最长的间隔。

在保持完整搜索的同时，我怎样才能提高效率（也许切换到更接近 n 次的搜索）？

def nextCease(TimesIS, timesDE, timeIndex):
    latestTime = TimesIS[timeIndex][1]
    for j in range (0, len(timesDE)):
        for i in range (0, len(timesDE)):
            if timesDE[i][0]<=latestTime and timesDE[i][1]>=latestTime:
                latestTime = timesDE[i][1]
        if latestTime == timesDE[0][1]:
            return latestTime
            break
    return latestTime

这是一小段数据输入（第一行是农民的数量）：

我认为这是一个最小、完整且可验证的示例：

from operator import itemgetter
times = [[100,200], [200,400], [400,800], [800,1600], [50,100], [1700,3200]

def nextCease(TimesIS, timesDE, timeIndex):
    latestTime = TimesIS[timeIndex][1]
    for j in range (0, len(timesDE)):
        for i in range (0, len(timesDE)):
            if timesDE[i][0]<=latestTime and timesDE[i][1]>=latestTime:
                latestTime = timesDE[i][1]
        if latestTime == timesDE[0][1]:
            return latestTime
            break
    return latestTime

timesIS = sorted(times[:], key=itemgetter(0)) #increasing starttimes
timesDE = sorted(times[:], key=itemgetter(1), reverse=True) #decreasing endtimes

longestIntervalMilk = 0
for i in range (0, len(times)):
    interval = nextCease(timesIS, timesDE, i) - timesIS[i][0]
    if interval > longestIntervalMilk:
        longestIntervalMilk = interval

longestIntervalNoMilk = 0
latestFinish = 0
for i in range (0, len(times)):
    latestFinish = nextCease(timesIS, timesDE, i)
    timesIS2 = timesIS[:]
    while(timesIS2[0][0] < latestFinish):
        nextStartExists = True
        del timesIS2[0]
        if timesIS2 == []:
            nextStartExists = False
            break
    if nextStartExists == True:
        nextStart = timesIS2[0][0]
        longestIntervalNoMilk = nextStart - latestFinish

print(str(longestIntervalMilk) + " " + str(longestIntervalNoMilk) + "\n"))

编辑：与此同时，我写了这个。它为一个很长的列表提供了错误的输出（它是 1001 行，所以我不会在这里重新打印它，但你可以在 http://train.usaco.org/usacodatashow?a=iA4oZAAX7KZ 找到它）我很困惑为什么：

times = sorted(times[:], key=itemgetter(0))

def longestMilkInterval(times):
    earliestTime = times[0]
    latestTime = times[0][1]
    interval = 0
    for i in range (1, len(times)):
        if times[i][1] > latestTime and times[i][0] <= latestTime:
            if times[i][1] - earliestTime[0] > interval:
                interval = times[i][1] - earliestTime[0]
                latestTime = times[i][1]
        else:
            earliestTime = times[i]
            latestTime = times[i][1]
            print(earliestTime)
    return interval

def longestNoMilkInterval(times):
    earliestTime = times[0][1]
    interval = 0
    for i in range (0, len(times)):
        if times[i][0] >= earliestTime:
            if times[i][0] - earliestTime > interval:
                interval = times[i][0] - earliestTime
                break
        else:
            earliestTime = times[i][1]
    return interval

输出应该是912 184（>=1 头奶牛，0 头奶牛）。

【问题讨论】：

是的，使用bisect 在排序的值列表中进行搜索。但我们可以通过minimal reproducible example 提供帮助。没有数据，就无法优化您的代码。
谢谢！我已经发布了更多内容，现在我正在阅读这篇文章，并且会尽力而为。
@Jean-FrançoisFabre 我忘了给你加标签，抱歉。
直接回答时不需要。所以我已经读过了。
嗯，我明白了。

标签： python search iteration

【解决方案1】：

这是一种非常简单的方法，可以一次性完成，包括排序，因此复杂度为O(n*logn)。

# Part 1: transform to tuples (start_time, typ)
items = []
for start, end in times:
    items += [(start, 's'), (end, 'e')]
items = sorted(items)

# Part 2: compute max durations where 0 or 1+ cows are being milked
max_0_cows = max_1plus_cows = 0

last_intersection_time = items[0][0] # starting with first cow milk time
nof_cows_milked = 1

for i, (start_time, typ) in enumerate(items[1:], 1):
    if items[i-1][0] == start_time and items[i-1][1] != typ:
        continue
    if i+1 < len(items) and items[i+1][0] == start_time and items[i+1][1] != typ:
        continue

    if typ == 's':
        nof_cows_milked += 1
    elif typ == 'e':
        nof_cows_milked -= 1

    # check if we cross from 1+ -> 0 or 0 -> 1+
    if (typ, nof_cows_milked) in (('e', 0), ('s', 1)):
        duration = start_time - last_intersection_time
        if nof_cows_milked == 1:
            max_0_cows = max(max_0_cows, duration)
        if nof_cows_milked == 0:
            max_1plus_cows = max(max_1plus_cows, duration)
        last_intersection_time = start_time

print("Max time 0 cows: {}, Max time 1+ cows: {}".format(max_0_cows, max_1plus_cows))

items 的构建：它将开始/结束迭代放入元组列表 (start_time, typ) 中，因此我们可以遍历列表，如果我们看到 s 正在挤奶，e 然后是奶牛停止挤奶。这样我们就可以随时拥有一个计数器nof_cows_milked，这是获得“最长时间挤奶 0 头奶牛”和“最长时间挤奶 1+ 头奶牛”的基础
实际最长时间查找器检查从 0 -> 1+ 头挤奶或 1+ 头奶牛 -> 0 头挤奶的所有转换。在前 4 行中，它过滤掉两个相邻迭代的情况（一个农民在另一个农民开始时停止）它使用last_intersection_time 跟踪这些时间，并将持续时间与max_0_cows 和max_1_plus_cows 的最大持续时间进行比较.同样，这部分不是很漂亮，也许有更优雅的方法来解决这个问题。

[我的算法] 给出了错误的输出 [...] 我很困惑为什么

您的算法基本上只检查单个元组的最长间隔，但不检查重叠或相邻的元组。

以这些区间为例：

您的代码只找到区间 G-H，而您需要找到 C-F。您需要在某个地方跟踪并行挤奶的奶牛数量，因此您至少需要 nof_of_cows_milked 计数器，如我的代码示例中所示。

【讨论】：

非常感谢。我对 Python 还是很陌生，所以你能解释一下从第 10 行开始的块在做什么吗？（可能在你的编辑中 - 我只是在思考它。如果是这样，请忽略这个。）
@PeterW：我现在解释完了，希望你明白，我在~1.5h回来
我查看了代码。问题（从我的例子中你不知道）是它不是那么简单。例如，300 1000, 700 1200, 1500 2100 会给出错误的输出，因为 end[i]!=start[i+1]
@PeterW：代码也应该涵盖那个场景，但是有一个错误，我现在修复了它
@PeterW：我已经简化了脚本，所以它只需要一次就可以完成，并且过滤掉相邻间隔现在也应该更清楚了（我放弃了 items_reduced 列表的构建)