加快 for 循环下的嵌套 if 循环答案

【问题标题】：Speed up nested if loops under a for loop加快 for 循环下的嵌套 if 循环
【发布时间】：2019-06-27 01:18:32
【问题描述】：

在二维平面上，有一个以 (0,0) 为中心、半径为 ???????? 的大圆。它包含了大约 100 个更小的圆，随机分布在父圆上，否则具有已知的半径和相对于原点的位置。（有可能一些较小的子圈部分或全部在一些较大的子圈内。）

整个平面被均匀地网格化为像素，边为水平和垂直（沿坐标轴）。像素的大小是固定的，是先验已知的，但远小于父圆的大小；整个父圆上有大约 1000 个特殊像素。我们得到了所有这些特殊网格（中心）的二维笛卡尔坐标。包含这些特殊网格中的至少一个的那些子圆被命名为 *special" 子圆以供以后使用。

现在，想象一下所有这些 3D 空间都充满了大约 100,000,000 个粒子。我的代码尝试在每个特殊的子圈内添加这些粒子。

我设法调试了我的代码，但是当我处理如此大量的粒子时，它似乎非常慢，如下所示。我想看看我是否可以使用任何技巧将其加速至少一个数量级。

.
.
.
for x, y in zip(vals1, vals2):  # vals1, vals2 are the 2d position array of the *special* grids each with a 1d array of size ~1000
    enclosing_circles, sub_circle_catalog, some_parameter_catalog, totals = {}, [], [], {}


    for id, mass in zip(ids_data, masss_data): # These two arrays are equal in size equal to an array of size ~100,000,000
        rule1 = some_condition           # this check if each special grid is within each circle
        rule2 = some_other_condition     # this makes sure that we are only concerned with those circles larger than some threshold size 

        if (rule1 and rule2):
            calculated_property = some_function

            if condition_3:
                calculated_some_other_property = some_other_function

                if condition_4:
                    some_quantity = something
                    enclosing_circles[id] = float('{:.4f}'.format(log10(mass)))
                    some_parameter[id] = float('{:.3e}'.format(some_quantity))


    # choose all sub-circles' IDs enclosing the special pixel
    enclosing_circles_list = list(enclosing_circles.keys())
    some_parameter_list = list(some_parameter.keys())
    sub_circle_catalog += [(enclosing_circles[i], 1) for i in enclosing_circles_list]
    some_parameter_catalog += [(enclosing_circles[i], some_parameter[j]) for i, j in zip(enclosing_circles_list, some_parameter_list)]

# add up all special grids in each sub-circle when looping over all grids
for key, value in sub_circle_catalog:
    totals[key] = totals.get(key, 0) + value
totals_dict = collections.OrderedDict(sorted(totals.items()))
totals_list = list(totals.items())


with open(some_file_path, "a") as some_file:
    print('{}'.format(totals_list), file=some_file)
    some_file.close()
.
.
.

【问题讨论】：

如果你想加速代码，你的第一步应该是分析它。看看你是否能找到占用大量时间的瓶颈。
对不起，你的意思是每条语句后面的打印输出，看看需要多长时间？
第二个for下的rule1和rule2耗时最长。
我建议在这些点上施加一个空间细分网格并将网格与圆圈相交。如果一个单元格完全在一个圆圈内，那么您可以计算一次该单元格中的点的总和。
@Allan 通过分析，通常意味着在某些外部监控软件下运行程序，该软件将逐行记录详细信息，例如执行时间、调用次数、缓存未命中、分支错误预测或任何内容否则你指定。对于Python，我个人使用kernprof

标签： python performance for-loop if-statement nested-loops

【解决方案1】：

第二个for下的rule1和rule2耗时最长。

内联rule1 和rule2。如果and 知道第一个是假的，它不会评估第二个部分。也可以尝试交换它们，看看是否更好。

根据这些规则的计算方式的详细信息，您可能会发现类似这样的捷径的其他机会。

始终分析以找到瓶颈。您可能会浪费大量时间来优化其他无济于事的部分。

可能的捷径；不要浪费时间计算你不需要的东西。

通过内联函数来避免嵌套循环中的函数调用。在 CPython 中调用有点慢。

展开内部循环以减少循环开销。

尽可能在循环之外计算事物，而不是在每个循环中重做。

考虑使用 Nutika、Cython 或 PyPy 编译整个东西。（或者只是使用 Cython 或 Numba 的缓慢部分。）

考虑用 Julia 重写这部分，从 Python 调用它更快更容易。最好提取并调用整个内部循环，而不仅仅是它的主体，以避免每个循环的调用开销。

尽可能考虑使用 numpy 进行矢量化计算，即使它只是循环的一部分。 Numpy 的内部循环比 Python 的快得多。这可能会占用更多内存。如果您可以使 numpy 向量化工作，您可能能够通过使用 GPU 的 CuPy 或可以处理更大数据集的 Dask 获得更大的加速。

【讨论】：

谢谢，我稍后会尝试这些，我会再次回来。不胜感激。