【问题标题】:Python: performance of finding divisorsPython:寻找除数的性能
【发布时间】:2017-12-21 23:32:01
【问题描述】:

对这些功能进行基准测试:

def divisors_optimized(number):
    square_root = int(math.sqrt(number))

    for divisor in range(1, square_root):
        if number % divisor == 0:
            yield divisor
            yield number / divisor

    if square_root ** 2 == number:
        yield square_root

def number_of_divisors_optimized(number):
    count = 0
    square_root = int(math.sqrt(number))

    for divisor in range(1, square_root):
        if number % divisor == 0:
            count += 2

    if square_root ** 2 == number:
        count += 1

    return count

你可以看到两者的基本结构是相同的。

基准代码:

number = 9999999
for i in range(10):
    print(f"iteration {i}:")
    start = time.time()
    result = list(utils.divisors_optimized(number))
    end = time.time()
    print(f'len(divisors_optimized) took {end - start} seconds and found {len(result)} divisors.')

    start = time.time()
    result = utils.number_of_divisors_optimized(number)
    end = time.time()
    print(f'number_of_divisors_optimized took {end - start} seconds and found {result} divisors.')

    print()

输出:

iteration 0:
len(divisors_optimized) took 0.00019598007202148438 seconds and found 12 divisors.
number_of_divisors_optimized took 0.0001919269561767578 seconds and found 12 divisors.

iteration 1:
len(divisors_optimized) took 0.00019121170043945312 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00020599365234375 seconds and found 12 divisors.

iteration 2:
len(divisors_optimized) took 0.000179290771484375 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00019049644470214844 seconds and found 12 divisors.

iteration 3:
len(divisors_optimized) took 0.00019025802612304688 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00020170211791992188 seconds and found 12 divisors.

iteration 4:
len(divisors_optimized) took 0.0001785755157470703 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00017905235290527344 seconds and found 12 divisors.

iteration 5:
len(divisors_optimized) took 0.00022721290588378906 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00020170211791992188 seconds and found 12 divisors.

iteration 6:
len(divisors_optimized) took 0.0001919269561767578 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00018930435180664062 seconds and found 12 divisors.

iteration 7:
len(divisors_optimized) took 0.00017881393432617188 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00017905235290527344 seconds and found 12 divisors.

iteration 8:
len(divisors_optimized) took 0.00017976760864257812 seconds and found 12 divisors.
number_of_divisors_optimized took 0.0001785755157470703 seconds and found 12 divisors.

iteration 9:
len(divisors_optimized) took 0.00024819374084472656 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00020766258239746094 seconds and found 12 divisors.

您可以看到执行时间非常接近,每次都有利于。

有人可以向我解释一下,为什么从生成器中创建列表并检索其长度与迭代时计数一样快?我的意思是,内存分配(list())不应该比分配贵得多吗?

我正在使用 Python 3.6.3。

【问题讨论】:

  • 创建列表不一定慢。事实上,它甚至可能更快。但是列表会占用内存,这是人们尽可能避免使用它们的主要原因。
  • 来自静态类型、主要是编译的语言背景,您可能倾向于认为内存分配很昂贵。相对于 JITless、动态 CPython 的所有其他开销,内存分配是杯水车薪。

标签: python performance-testing python-3.6 execution-time


【解决方案1】:

你测试的东西远远多于你生产的东西。 intlist 在“找到的因子”情况下的生成器操作的成本与完成的总工作量相比相形见绌。您正在执行 3000 多个试验部门;十二个yields 与十二个添加是对这类工作的小改动。将加法/yields 替换为 pass(什么都不做),您会发现它仍然运行(大致)相同的时间:

def ignore_divisors_optimized(number):
    square_root = int(math.sqrt(number))

    for divisor in range(1, square_root):
        if number % divisor == 0:
            pass

    if square_root ** 2 == number:
        pass

以及使用ipython%timeit 魔术进行微基准测试:

>>> %timeit -r5 number_of_divisors_optimized(9999999)
266 µs ± 1.85 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)
>>> %timeit -r5 list(divisors_optimized(9999999))
267 µs ± 1.29 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)
>>> %timeit -r5 ignore_divisors_optimized(9999999)
267 µs ± 1.43 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)

number_of_divisors 快了一微秒这一事实无关紧要(重复测试的抖动高于一微秒);它们的速度基本相同,因为 >99% 的工作是循环和测试,而不是测试通过时所做的工作。

这是 90/10 优化规则的一个示例:大约 90% 的时间花在 10% 的代码上(在这种情况下,是试用部门本身); 10% 用于其他 90% 的代码。您正在优化运行时间为 10% 的 90% 代码的一小部分,但这并没有帮助,因为绝大多数时间都花在了 if number % divisor == 0: 行上。如果您删除该测试以支持只循环 range 什么都不做,则运行时间在我的本地微基准测试中下降到 ~78 µs,这意味着该测试占用了将近 200 µs 的运行时间,是所有其余部分的两倍多代码放在一起需要。

如果您想对此进行优化,您需要查看加速试验分割线本身的方法(这基本上意味着不同的 Python 解释器或使用 Cython 将其编译为 C),或者运行该线的方法更少的时间(例如,预先计算可能的素因子直到某个界限,因此对于任何给定的输入,您可以避免测试非素因子,然后从已知的素因子及其多重性中产生/计算非素因子的数量)。

【讨论】:

    猜你喜欢
    • 2015-07-24
    • 1970-01-01
    • 2021-06-12
    • 2012-01-31
    • 1970-01-01
    • 2019-02-25
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多