Python Threading 执行时间不一致答案

【问题标题】：Python Threading inconsistent execution timePython Threading 执行时间不一致
【发布时间】：2018-08-24 15:51:50
【问题描述】：

使用threading 库加速计算点云中每个点的邻域。通过在帖子底部调用函数CalculateAllPointsNeighbors。
该函数接收搜索半径、最大邻居数和拆分工作的线程数。没有对任何点进行任何更改。每个点将数据存储在其自己的np.ndarray 单元格中，由其自己的索引访问。

以下函数计算N线程数完成计算所有点邻域所需的时间：

def TimeFuncThreads(classObj, uptothreads):
    listTimers = []

    startNum = 1
    EndNum = uptothreads + 1

    for i in range(startNum, EndNum):
        print("Current Number of Threads to Test: ", i)
        tempT = time.time()
        classObj.CalculateAllPointsNeighbors(searchRadius=0.05, maxNN=25, maxThreads=i)
        tempT = time.time() - tempT
        listTimers.append(tempT)

    PlotXY(np.arange(startNum, EndNum), listTimers)

问题是，我在每次运行中都得到了非常不同的结果。以下是函数 TimeFuncThreads 的 5 次后续运行的图。 X 轴是线程数，Y 是运行时。首先，它们看起来完全随机。其次，没有显着的加速提升。

我现在很困惑我是否使用了 threading 库错误以及我得到的这种行为是什么？

处理线程的函数和从每个线程调用的函数：

def CalculateAllPointsNeighbors(self, searchRadius=0.20, maxNN=50, maxThreads=8):

    threadsList = []
    pointsIndices = np.arange(self.numberOfPoints)
    splitIndices = np.array_split(pointsIndices, maxThreads)

    for i in range(maxThreads):
        threadsList.append(threading.Thread(target=self.GetPointsNeighborsByID,
                                            args=(splitIndices[i], searchRadius, maxNN)))

    [t.start() for t in threadsList]
    [t.join() for t in threadsList]



def GetPointsNeighborsByID(self, idx, searchRadius=0.05, maxNN=20):
    if isinstance(idx, int):
        idx = [idx]

    for currentPointIndex in idx:
        currentPoint = self.pointsOpen3D.points[currentPointIndex]
        pointNeighborhoodObject = self.GetPointNeighborsByCoordinates(currentPoint, searchRadius, maxNN)
        self.pointsNeighborsArray[currentPointIndex] = pointNeighborhoodObject
        self.__RotatePointNeighborhood(currentPointIndex)

【问题讨论】：

您确定这不是系统上其他因素（如防病毒或其他软件）影响运行时间的结果吗？
附带说明：不要将列表推导用于副作用，仅用于构建列表。如果你这样做是为了节省空间，你可以写一个简单的for 语句作为单行语句，它更短，而不是更长。如果您这样做是因为您听说 listcomps 更快，那么它们比在每个值上调用 append 更快，但它们比根本不构建列表要慢。
时间都花在了哪里？一眼看去，它可能在您没有向我们展示的 __RotatePointNeighborhood 方法中。无论它在哪里，每个线程是否都坚持自己的共享数组段，或者它们都在尝试读取（或更糟糕的是，写入）重叠的段？
@abarnert 感谢您提供列表理解提示。所有点都写入相同的np.ndarray。虽然每个点只写入自己的索引。例如，p1 写信给array[1]。而且由于线程之间没有重叠点，并且每个点的邻域都计算一次，我认为这不是问题。

标签： python python-3.x multithreading

【解决方案1】：

成为向您介绍Python Gil 的人让我很痛苦。是一个非常好的特性，它使在 Python 中使用线程的并行性成为一场噩梦。

如果你真的想提高你的代码速度，你应该看看the multiprocessing module

【讨论】：

我试过在GetPointsNeighborsByID函数入口做if idx[0]==0: {print("Sleep") time.sleep(5)} else: print("No sleep")，所以只有第一个线程休眠。这意味着如果它被暂停，所有后续线程都将被暂停 - 假设它们没有并行运行。但是，使用maxThreads=4 我得到了输出：睡眠，不睡眠，不睡眠，不睡眠。即刻，无需等待 5 秒。怎么解释？
线程做并行运行......**有点**。 GIL 防止多个线程同时执行 python 字节码。 I/O 操作和本机库没有这个问题。当您调用 time.sleep 时，该线程进入睡眠状态，下一个线程开始执行 bytecode 直到它阻塞或执行本机代码（这就是您不断获得输出 BTW 的原因）