循环性能下降答案

【问题标题】：Loop performance degradation循环性能下降
【发布时间】：2014-01-14 03:52:23
【问题描述】：

我有以下代码：

keywordindex = cPickle.load(open('keywordindex.p','rb'))#contains~340 thousand items
masterIndex = {}

indexes = [keywordindex]
for partialIndex in indexes:
    start = time.time()
    for key in partialIndex.iterkeys():
        if key not in masterIndex.keys():
            masterIndex[key]= partialIndex[key]
        elif key in masterIndex.keys():
            masterIndex[key].extend(partialIndex[key])
    cPickle.dump(masterIndex,open('MasterIndex.p','wb'))

    print int(time.time() - start), ' seconds'#every thousand loops

当循环运行时，我正在经历性能下降，前 10000 次大约需要 5 秒/1000 秒，但每 10000 次左右需要另外 1 秒，直到它需要 3 倍的时间。我试图以各种可能的方式简化代码，但我似乎无法弄清楚是什么原因造成的。是否有一个原因？这不是内存问题，我只有 30% 的使用率

【问题讨论】：

你为什么要创建一个单元素列表indexes 然后循环遍历它？看起来你可以完全废弃外循环。
@user 有超过 1 个列表，我为问题删除了它们

标签： python dictionary pickle

【解决方案1】：

此块包含两个极其低效的编码实例：

    if key not in masterIndex.keys():
        masterIndex[key]= partialIndex[key]
    elif key in masterIndex.keys():
        masterIndex[key].extend(partialIndex[key])

首先，关键是或不在masterIndex 中，因此elif 测试根本没有有用点。如果not in 测试失败，in 测试必须成功。所以这段代码也是一样的：

    if key not in masterIndex.keys():
        masterIndex[key]= partialIndex[key]
    else:
        masterIndex[key].extend(partialIndex[key])

其次，masterIndex 是 dict。字典支持非常有效的成员资格测试，无需您的任何帮助；-) 通过将其键具体化到列表中（通过.keys()），您正在将应该的快速字典查找更改为可怕的对列表进行慢速线性搜索。所以改为这样做：

    if key not in masterIndex:
        masterIndex[key]= partialIndex[key]
    else:
        masterIndex[key].extend(partialIndex[key])

然后代码会运行得更快。

【讨论】：

看到你的答案就像在大街上看到名人。
感谢您简洁明了的回答。我知道elif 不好，它是之前的遗留物，但我不知道dicts 的成员资格测试比lists 更快
@ChuckFulminata，成员测试是 O(1)，而线性搜索是 O(n)
谢谢，@EdgarAroutiounian :-) @ChuckFulminata，Edgar 是对的。你正在做的是O(n) 两次：首先需要O(n) 时间来创建列表。这是最好和最坏的情况。然后需要额外的O(n) 时间来搜索列表。这就是为什么您的代码会随着时间的推移变得越来越慢（键的数量不断增加）。字典插入和测试预计在O(1)（常量）时间与键的数量无关。
@TimPeters 我运行了你的代码，它在 30 秒内处理了 60 万个对象，谢谢

【解决方案2】：

您无需搜索masterIndex.keys()。另外，只需使用一个空的else 子句：

if key not in masterIndex:
    ...
else:
    ...

字典上的in 运算符查询该字典的键，该操作的平均时间复杂度为 O(1)。

【讨论】：

elif 是之前遗留下来的，当时我需要知道到底发生了什么