降低python（2.7）算法的时间复杂度答案

【问题标题】：Reducing time-complexity of python (2.7) algorithm降低python（2.7）算法的时间复杂度
【发布时间】：2021-01-03 06:42:10
【问题描述】：

我输入了一个列表，该列表由三个列表组成，每个列表分别代表 X、Y 和 Z 坐标。例如：

coords = [[2, 1, 5, 2, 8, 6, 8, 6, 1, 2, 3 , 4], [1, 3, 4, 1, 2, 2, 2, 4, 2, 3, 4, 5], [2, 4, 7, 2, 1, 2, 1, 4, 5, 6, 9, 8]]

坐标列表 X 所在的位置：X = [2, 1, 5, 2, 8, 6, 8, 6, 1, 2, 3 , 4]

一个点将像这样形成：point = [2, 1, 2]。点 XYZ 表示立方体的一个顶点。（在我的程序中，我必须分析一组堆叠或并排的立方体）。

作为函数的输出，我想要一个与总点数一样大的 ID 列表（= 坐标列表之一的长度）。对于不同的点，ID 必须是唯一的，并且随着点列表的迭代而顺序递增。当一个点已经遇到时（例如，当一个立方体的一个顶点与另一个立方体的顶点重合时），在输出列表中，该点必须具有首先遇到相同点的 ID。

例子的结果应该是outp = [1, 2, 3, 1, 5, 6, 5, 8, 9, 10, 11]

这是我编写的代码，它运行良好：

def AssignIDtoNode(coords):

    outp = []
    n_points = len(coords[0])
    points = []

    memo_set = set()
    memo_lst = ["" for x in xrange(0, n_points)]

    for i in range(n_points):

        point = "(X = " + str(coords[0][i]) + ", Y = " + str(coords[1][i]) + ", Z = " + str(coords[2][i]) + ")"
        if punto not in memo_set:
            outp.append(i+1)
            memo_set.add(point)
            memo_lst[i] = point
        else:
            ind = memo_lst.index(point)
            outp.append(ind+1)
                
    return outp

当函数的输入有一个非常大的点列表（数百万个点）并且计算时间显着增加时，就会出现问题。我已将每个点转换为一个字符串以方便搜索，并尽可能使用一组来减少第一次搜索时间。在我看来，程序需要通过 .index() 函数搜索某个点的索引时，耗时较长。

有没有办法进一步优化这个功能？

【问题讨论】：

这不是代码审查问题吗？
Formatting help... Formatting sandbox
如果可能的话，你真的应该切换到 Python 3.x 版本。！

标签： python algorithm optimization time-complexity ironpython

【解决方案1】：

使用 enumerate、zip 和字典来存储索引 - {(x,y,z):index,...}

def f(coords):

    d = {}
    outp = []
    for i,punto in enumerate(zip(*coords),1):
        d[punto] = d.get(punto,i)    # if it doesn't exist add it with the current index
        outp.append(d[punto])
                
    return outp

单次通过点，无类型转换，恒定时间查找。

>>> AssignIDtoNode(coords) == f(coords)
True

zip and enumerate docs

LBYL ...

def g(coords):
    outp = []
    d = {}
    for i,punto in enumerate(zip(*coords),1):
        if punto not in d:
            d[punto] = i
        outp.append(d[punto])        
    return outp

对于 100 万和 300 万 (x,y,z) 点，g 比 f 快约 25%。

【讨论】：

像这样使用OrderedDict 只会在没有重复的情况下输出正确的结果。而且无论如何，OrderedDict 在 Python2 中并没有在 C 中实现，因此它会比 OPs 原始解决方案慢得多。
对于大型列表，使用get 效率低下，因为在Python 中方法调用相对昂贵。使用 LBYL 方法会更有效，这也将消除对丢失键的额外查找。
重新编辑您的最新编辑：正如我上面所说，您不能从映射中返回值，因为它会删除所有重复项并因此给出错误的输出...
@ekhumoro - 是的，我在遛狗时意识到了这一点。嗯。
您的 LBYL 示例可以保存更多这样的查找：if punto not in d: d[punto] = i; else: i = d[punto]; outp.append(i)。但我认为使用setdefault 可以实现更快的解决方案，它几乎可以简化为单行：d = {}; outp = [d.setdefault(c, i) for i, c in enumerate(zip(*coords), 1)]。这也可以通过缓存setdefault 方法来改进一点：即d = {}; sd = setdefault; [sd(c, i) ... ]。

【解决方案2】：

使用从点映射到索引的dict

def AssignIDtoNode(coords):

outp = []
n_points = len(coords[0])
points = []

memo_dict = dict()

for i in range(n_points):

    point = tuple(coords[0][i],coords[1][i],coords[2][i])
    if point not in memo_dict:
        outp.append(i+1)
        memo_dict[point] = i+1
    else:
        ind = memo_dict[point]
        outp.append(ind+1)
            
return outp

【讨论】：

为什么需要memo_dict 和memo_set？您可以检查if point not in memo_dict，这同样有效。此外，您可以只使用 point = (x, y, z) 而不是构建字符串。元组是很棒的字典键。
你说得对，我拿了原始代码并将他的列表替换为dict，我将编辑我的答案

【解决方案3】：

您应该能够使用带有内部字典的列表推导一次通过线性时间完成：

coords = [[2, 1, 5, 2, 8, 6, 8, 6, 1, 2, 3, 4], 
          [1, 3, 4, 1, 2, 2, 2, 4, 2, 3, 4, 5], 
          [2, 4, 7, 2, 1, 2, 1, 4, 5, 6, 9, 8]]

IDs = [d[c] for d in [dict()] for c in zip(*coords) if d.setdefault(c,len(d)+1)]

print(IDs)
# [1, 2, 3, 1, 4, 5, 4, 6, 7, 8, 9, 10]

【讨论】：