python程序提高性能的技术（一）

首先明确四个问题，并分四个部分来讨论。

1.程序运行时衡量性能的基本方法是什么？

2.如何通过分析代码来识别性能瓶颈？

3.如何使用memory_profiler包来进行基本内容分析？

4.如何使用大O来表示计算复杂度？

第一部分

提高程序性能，笼统的第一个想法就是提高cpu使用量和内存效率、减少网络上的延迟传输或消耗等，这样会使程序运行的更快。

先新编写一个程序，以此为测试对象进行解释。

《宝藏猎人程序》

假设你是一个宝藏猎人，路过一片充满金币的半径为10,（直径为20）的圆形区域，你只能沿着区域的直径行走并收集金币。收集方法是每走一步（1个单位），收集半径为1的一个圆内的金币。如图：

可以将大圆的圆心设为坐标（0,0），十等分直径，你的搜索金币的半径为1。图中每个小圆的圆心，就是你停留并搜索金币的位置，如果在此范围内则收集此金币，最后汇总数量。

此程序逻辑为：a.对每个小搜索圈，获取中心坐标（你的位置）b.计算每个黄金和你所在位置（小圆心）的距离。c.收集小于等于你搜索半径的金币，即小圆内的金币。d.你走到下一个搜索圈中心，重复刚才步骤。e.计算你获得金币的总数。

import math
import random


class GoldHunt:
    def __init__(self, field_coins=5000, field_radius=10, search_radius=1):
        self.field_coins = field_coins  # 区域金币总数
        self.field_radius = field_radius  # 总区域半径
        self.search_radius = search_radius  # 搜索半径

        self.your_x = -(self.field_radius - self.search_radius)  # 你的初始位置x坐标
        self.your_y = 0

        self.movedistance = 2 * search_radius  # 从第一个小圆开始，每次移动距离为2

    def generate_random_points(self, tmp_radius, total_points):  # 在大圆创建随机点，即金币位置,参数为圆区域半径、金币总数
        coins_x = []
        coins_y = []
        for i in range(total_points):
            theta = random.uniform(0, 2 * math.pi)  # 随机创建0~360度内夹角
            r = tmp_radius * math.sqrt(random.uniform(0, 1))  # 随机创建的点的半径，用r=random.uniform(1,10)无法保证随机点在圆内
            coins_x.append(r * math.cos(theta))  # 计算金币x坐标并追加到x坐标列表
            coins_y.append(r * math.sin(theta))
        return coins_x, coins_y

    def find_coins(self, x_list, y_list):
        collected_coins = []
        for x, y in zip(x_list, y_list):
            tmp_x = self.your_x - x
            tmp_y = self.your_y - y
            dist = math.sqrt(tmp_x * tmp_x + tmp_y * tmp_y)  # 计算你的当前坐标和硬币坐标的距离
            if dist <= self.search_radius:  # 如果小于搜索半径则加入到收集列表
                collected_coins.append((x, y))
        return collected_coins

    def play(self):  # 程序逻辑
        total_collected_coins = []  # 收集金币总数
        x_list, y_list = self.generate_random_points(self.field_radius, self.field_coins)

        while self.your_x <= 9:  # 自己的x坐标小于9
            coins = self.find_coins(x_list, y_list)  # 收集硬币收集收集数量
            print("坐标：", self.your_x, "收集硬币数：", len(coins))
            total_collected_coins.extend(coins)  # 列表追加到总记录
            self.your_x += self.movedistance  # 向右移动一次
        print("总金币收集数：", len(total_collected_coins))


if __name__ == \'__main__\':
    game = GoldHunt()
    game.play()

这个程序完成后，测试发现当增大范围内的金币数量，或减少搜索半径，都会显著增加程序的运行时间。

如何准确测量时间？可以借助python内置时间模块。

更改代码：

if __name__ == \'__main__\':
    start = time.perf_counter()  # 记录开始时刻
    game = GoldHunt()
    game.play()
    end = time.perf_counter()  # 程序结束时刻
    print("代码断总时间为：", end - start)  # 统计

也可以借助timeit模块来监测时间。用法：python -m timeit [--number=自定义代码执行次数] "语句或命令"

例子：python -m timeit \'goldhunt\' （只需要模块名不需后缀）

这些计时器，测量整个程序还能用法，但如果在整个程序各个模块实现多个计时器，无疑是很麻烦的，此时就需要代码分析技术（cProfile、pstats、line_profile包）出场了。能够统计各种函数调用频率和时间，用于识别出代码的性能瓶颈。

编写测试程序ex.py:

 1 def test1():
 2     return 100 * 100
 3 
 4 
 5 def test2():
 6     x = []
 7     for i in range(10000):
 8         temp = i / 1000
 9         x.append(temp * temp)
10     return x
11 
12 
13 def test3(condition=False):
14     if condition:
15         test3()

View Code

命令行中运行：python -m cProfile ex.py ，结果如下：

10007 function calls (10006 primitive calls) in 0.002 seconds     #显示函数调用总数   primitive----原始的，调用不涉及递归

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)   #ncalls函数调用的数量、tottime显示给定的函数花费总时间、percall=totcall/ncalls、cumtime累计时间（包括其子函数花费时间)、
1 0.002 0.002 0.002 0.002 ex.py:13(test2)
2/1 0.000 0.000 0.000 0.000 ex.py:21(test3)
1 0.000 0.000 0.002 0.002 ex.py:9(<module>)
1 0.000 0.000 0.000 0.000 ex.py:9(test1)
1 0.000 0.000 0.002 0.002 {built-in method builtins.exec}
10000 0.001 0.000 0.001 0.000 {method \'append\' of \'list\' objects}
1 0.000 0.000 0.000 0.000 {method \'disable\' of \'_lsprof.Profiler\' objects}

看tottime列，能发现test2模块耗费时间最长，为0.002s

现在可以试着用cProfile分析goldhunt问题了，并将其重定向到一个文件。python -m cProfile goldhunt.py >1.txt

通过命令 python -m cProfile -o profile_output goldhunt.py 可以用pstats模块对cProfile重定向中的文件进行进更美观直接的分析,此时profile_output文件不可读，供pstats使用。

如果在python程序中测试，则添加两个模块后更改的地方：

 1 import cProfile
 2 import pstats
 3 
 4 """goldhunt源代码"""
 5 
 6 def play_game():
 7     game = GoldHunt()
 8     game.play()
 9 
10 
11 def view_stats(file, text_restriction):  # 第一个参数为 要分析的文件名
12     stats = pstats.Stats(file)
13     stats.strip_dirs()  # 从文件名中删除所有路径前缀信息字符串，简化输出文件
14     sorted_stats = stats.sort_stats("tottime")
15     sorted_stats.print_stats("goldhunt")  # 从全部内容筛选并打印出关于goldhunt的行信息
16 
17 
18 if __name__ == \'__main__\':
19     filename = "profile_output"
20     cProfile.run(\'play_game()\', filename)  # 用run来运行cProfile，参数为监控的函数、设置输出的文件名
21     view_stats(filename, "goldhunt")

View Code

结果为：

Thu Oct 22 16:48:12 2020 profile_output

95588 function calls in 0.033 seconds

Ordered by: internal time
List reduced from 17 to 5 due to restriction <\'goldhunt\'>

ncalls tottime percall cumtime percall filename:lineno(function)
10 0.016 0.002 0.021 0.002 goldhunt.py:38(find_coins)
1 0.006 0.006 0.012 0.012 goldhunt.py:27(generate_random_points)
1 0.000 0.000 0.033 0.033 goldhunt.py:60(play_game)
1 0.000 0.000 0.033 0.033 goldhunt.py:48(play)
1 0.000 0.000 0.000 0.000 goldhunt.py:17(__init__)

查看结果显示出最费时的两个函数模块为：find_coins与generate_random_points

既然找到了最费时的模块，能否对费时的模块进一步进行分析，找到内部问题呢？答案就是line_profiler包。这个包可以逐行的监视函数的性能。通过pip安装。安装方法

如果手动下载模块，则将需要的whl文件并解压到Python/Lib/site-packages中。在cmd窗口运行 pip install 带.whl文件的路径。注意和python版本对应。python3.8就下载（line_profiler-3.0.2-cp38）

安装完毕后，需要对待测试函数进行一些修改，即在其前加修饰@profile，然后运行kernprof -v -l goldhunt.py (-v表示在终端显示分析结果，-l表示使用分析包中的line-by-line分析器).

结果如下（在find_coins函数上一行加入@profile，然后在终端运行kernprof）：

Wrote profile results to goldhunt.py.lprof
Timer unit: 1e-07 s

Total time: 0.0967073 s
File: goldhunt.py
Function: find_coins at line 37

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    37                                               @profile
    38                                               def find_coins(self, x_list, y_list):
    39        10        149.0     14.9      0.0          collected_coins = []
    40     50010     163313.0      3.3     16.9          for x, y in zip(x_list, y_list):
    41     50000     183230.0      3.7     18.9              tmp_x = self.your_x - x
    42     50000     178229.0      3.6     18.4              tmp_y = self.your_y - y
    43     50000     264055.0      5.3     27.3              dist = math.sqrt(tmp_x * tmp_x + tmp_y * tmp_y)  # 计算你的当前坐标和硬币坐标的距离
    44     50000     175174.0      3.5     18.1              if dist <= self.search_radius:  # 如果小于搜索半径则加入到收集列表
    45       481       2833.0      5.9      0.3                  collected_coins.append((x, y))
    46        10         90.0      9.0      0.0          return collected_coins

能看到第43行计算距离的代码耗时最高。

注意如果不再使用line_profiler后，一定要去掉@profile修饰符。否则程序无法正常运行。

以上查看了程序所运行的时间情况，如何查看内存占用情况呢？需要安装两个模块memory_profiler与pautil。安装完毕后用法和line_profiler类似，也是在函数前加前缀@profile，

然后命令行调用python -m memory_profiler gold_hunt.py ，会产生内存分析器的输入。

如果在运行的时候出现如下的gbk解码错误，解决方案是首先进入 memory_profiler.py文件中，找到第1131行，把with open(filename) as f: 更改成 with open(filename, encoding=\'utf-8\') as f：！！！

测试generate_random_points函数，结果如下：

Filename: goldhunt.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    28   40.637 MiB   40.637 MiB           1       @profile
    29                                             def generate_random_points(self, tmp_radius, total_points):  # 在大圆创建随机点，即金币位置,参数为圆区域半径、金币
总数
    30   40.637 MiB    0.000 MiB           1           coins_x = []
    31   40.637 MiB    0.000 MiB           1           coins_y = []
    32   41.148 MiB    0.273 MiB        5001           for i in range(total_points):
    33   41.148 MiB    0.004 MiB        5000               theta = random.uniform(0, 2 * math.pi)  # 随机创建0~360度内夹角
    34
    35   41.148 MiB    0.000 MiB        5000               r = tmp_radius * math.sqrt(random.uniform(0, 1))  # 随机创建的点的半径，为什么不用r=random.uniform(1,10)直
接生成？
    36   41.148 MiB    0.172 MiB        5000               coins_x.append(r * math.cos(theta))  # 计算金币x坐标并追加到x坐标列表
    37   41.148 MiB    0.062 MiB        5000               coins_y.append(r * math.sin(theta))
    38   41.148 MiB    0.000 MiB           1           return coins_x, coins_y

黄色标注明显内存增长，表示内存主要在for循环语句中被使用。

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

以上内容介绍了如何测量程序时间的方法。现在来看算法和复杂度问题。

算法是解决特定问题的一组指令。资源消耗越低，效率越高。

假如一个算法五分钟内可以处理一些数据，如果增大待处理数据量，程序的时间就会出现各种变化，即不同的算法复杂度也会不同。

需要说明的是及时两个算法有相同的大O时间复杂度，性能也不是一样的（有可能受其他影响，比如说乘以一个常数，因为常数在计算复杂度时常被忽略）

大O复杂度表示最坏情况的复杂度。

直观的大O复杂度排序(需要记住)：O(1)<O(lgn)<O(n)<O(n*lgn)<O(n*n)<O(n*n*n)-------->常数<对数<线性<对数*线性<平方<三次方

对数O(lgn)的例子为二分查找；对数*线性O(n*lgn)的例子为快速排序；平方O(n*n)的例子为冒泡排序

为什么说快速排序的最差时间复杂度是O(n*n)?----最差情况退化到了冒泡排序情况

发表于 2020-10-22 12:55 footmark89 阅读(79) 评论(0) 编辑收藏举报