带有不确定性包的计算时间出乎意料的长答案

【问题标题】：Unexpectedly long computation time with uncertainties package带有不确定性包的计算时间出乎意料的长
【发布时间】：2019-04-07 02:16:57
【问题描述】：

考虑以下代码片段：

import random
from uncertainties import unumpy, ufloat

x = [random.uniform(0,1) for p in range(1,8200)]
y = [random.randrange(0,1000) for p in range(1,8200)]
xerr = [random.uniform(0,1)/1000 for p in range(1,8200)]
yerr = [random.uniform(0,1)*10 for p in range(1,8200)]

x = unumpy.uarray(x, xerr)
y = unumpy.uarray(y, yerr)
diff = sum(x*y)
u = ufloat(0.0, 0.0)
for k in range(len(x)):
    u+= (diff-x[k])**2 * y[k]  

print(u)

如果我尝试在我的计算机上运行它，最多需要 10 分钟才能产生结果。我不太确定为什么会这样，并希望得到某种解释。如果我不得不猜测，我会说不确定性的计算由于某种原因比人们想象的要复杂，但就像我说的，这只是一个猜测。有趣的是，如果删除最后的 print 指令，代码几乎立即完成，老实说，这让我感到困惑，而不是帮助......

如果您不知道，this 是不确定性库的 repo。

【问题讨论】：

不确定uncertanties 是什么以及unumpy 等做什么。但是，如果列表很长，则有一个循环超过 uarray(x, xerr) 长度的 for 循环可能需要一段时间。 x 的长度是多少？你有没有计时，看看哪个部分花费了时间？
@Torxed 在这种情况下是 8200。对我来说，这似乎不是一个极长的数组，或者至少我不希望在这样的列表上进行这样的基本操作需要这么长时间。 .
@Torxed 直到现在才看到您的第二个问题...是的，我用tqdm 对 for 循环进行了计时，它几乎立即达到了 100% 的完成，但就是无法完成。 .
其实需要时间的是print(u)。我检查了一下，循环的每次迭代大约需要1.1444091796875e-05，因此整个循环需要大约 0.182 秒。打印 AffineScalarFunc (u) 需要时间。不确定print(u.n) 是什么意思，但这是一个相当大的数字1.7427233520528605e+19。所以我猜里面的数字比你想象的要多？
@Torxed 通过设置y= [random.randrange(0,1) for p in range(1,8200)] 和yerr = [random.uniform(0,1) for p in range(1,8200)]，这个过程肯定会加快一点（因为结果数字〜1），但最终它仍然需要相当长的时间......但老实说，不管u 有多大，我仍然很困惑，为什么要花这么多时间才能打印出来……

标签： python performance uncertainty

【解决方案1】：

我可以复制这个，打印是永远的。或者更确切地说，它是转换为 print 隐式调用的字符串。我用line_profiler来测量AffineScalarFunc的__format__函数的时间。（被__str__调用，被print调用）我将数组大小从 8200 减少到 1000 以使其运行得更快一些。这是结果（为便于阅读而删减）：

Timer unit: 1e-06 s

Total time: 29.1365 s
File: /home/veith/Projects/stackoverflow/test/lib/python3.6/site-packages/uncertainties/core.py
Function: __format__ at line 1813

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  1813                                               @profile
  1814                                               def __format__(self, format_spec):

  1960                                           
  1961                                                   # Since the '%' (percentage) format specification can change
  1962                                                   # the value to be displayed, this value must first be
  1963                                                   # calculated. Calculating the standard deviation is also an
  1964                                                   # optimization: the standard deviation is generally
  1965                                                   # calculated: it is calculated only once, here:
  1966         1          2.0      2.0      0.0          nom_val = self.nominal_value
  1967         1   29133097.0 29133097.0    100.0          std_dev = self.std_dev
  1968

您可以看到几乎所有时间都发生在第 1967 行，计算标准差。如果再深入一点，你会发现error_components 属性是问题所在，derivatives 属性是问题所在，_linear_part.expand() 是问题所在。如果您对此进行分析，您就会开始找到问题的根源。这里的大部分工作都是均匀分布的：

Function: expand at line 1481

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  1481                                               @profile
  1482                                               def expand(self):
  1483                                                   """
  1484                                                   Expand the linear combination.
  1485                                           
  1486                                                   The expansion is a collections.defaultdict(float).
  1487                                           
  1488                                                   This should only be called if the linear combination is not
  1489                                                   yet expanded.
  1490                                                   """
  1491                                           
  1492                                                   # The derivatives are built progressively by expanding each
  1493                                                   # term of the linear combination until there is no linear
  1494                                                   # combination to be expanded.
  1495                                           
  1496                                                   # Final derivatives, constructed progressively:
  1497         1          2.0      2.0      0.0          derivatives = collections.defaultdict(float)
  1498                                           
  1499  15995999    4942237.0      0.3      9.7          while self.linear_combo:  # The list of terms is emptied progressively
  1500                                           
  1501                                                       # One of the terms is expanded or, if no expansion is
  1502                                                       # needed, simply added to the existing derivatives.
  1503                                                       #
  1504                                                       # Optimization note: since Python's operations are
  1505                                                       # left-associative, a long sum of Variables can be built
  1506                                                       # such that the last term is essentially a Variable (and
  1507                                                       # not a NestedLinearCombination): popping from the
  1508                                                       # remaining terms allows this term to be quickly put in
  1509                                                       # the final result, which limits the number of terms
  1510                                                       # remaining (and whose size can temporarily grow):
  1511  15995998    6235033.0      0.4     12.2              (main_factor, main_expr) = self.linear_combo.pop()
  1512                                           
  1513                                                       # print "MAINS", main_factor, main_expr
  1514                                           
  1515  15995998   10572206.0      0.7     20.8              if main_expr.expanded():
  1516  15992002    6822093.0      0.4     13.4                  for (var, factor) in main_expr.linear_combo.items():
  1517   7996001    8070250.0      1.0     15.8                      derivatives[var] += main_factor*factor
  1518                                           
  1519                                                       else:  # Non-expanded form
  1520  23995993    8084949.0      0.3     15.9                  for (factor, expr) in main_expr.linear_combo:
  1521                                                               # The main_factor is applied to expr:
  1522  15995996    6208091.0      0.4     12.2                      self.linear_combo.append((main_factor*factor, expr))
  1523                                           
  1524                                                       # print "DERIV", derivatives
  1525                                           
  1526         1          2.0      2.0      0.0          self.linear_combo = derivatives

你可以看到有很多调用expanded，调用isinstance、which is slow。还要注意 cmets，它暗示这个库实际上只在需要时计算导数（并且知道它真的很慢）。这就是为什么转换成字符串需要这么长的时间，而之前没有这个时间。

在__init__ 的AffineScalarFunc 中：

# In order to have a linear execution time for long sums, the
# _linear_part is generally left as is (otherwise, each
# successive term would expand to a linearly growing sum of
# terms: efficiently handling such terms [so, without copies]
# is not obvious, when the algorithm should work for all
# functions beyond sums).

在std_dev 中的AffineScalarFunc：

#! It would be possible to not allow the user to update the
#std dev of Variable objects, in which case AffineScalarFunc
#objects could have a pre-calculated or, better, cached
#std_dev value (in fact, many intermediate AffineScalarFunc do
#not need to have their std_dev calculated: only the final
#AffineScalarFunc returned to the user does).

在expand 中LinearCombination：

   # The derivatives are built progressively by expanding each
    # term of the linear combination until there is no linear
    # combination to be expanded.

总而言之，这在某种程度上是意料之中的，因为库处理这些需要大量操作来处理的非本地数字（显然）。

【讨论】：

非常感谢您的解释！知道问题仅在于标准偏差的打印当然很有用。您是否有机会找到解决问题的方法？
我对这个库不是很熟悉，但目前看来你只需要处理它。在某些时候，您只需要计算最终值，无论是在打印期间还是之前。作者内置了一些缓存，所以我想有一些努力来提高速度。