我用timeit 模块测试了不同的变体。当我生成不经常重复的测试数据时,您的变化非常有效,但对于短字符串,我的 stringcompress_using_string 是最快的方法。随着字符串变长,一切都颠倒了,你的做事方法变得最快,stringcompress_using_string 是最慢的。
这只是说明了在不同情况下进行测试的重要性。我最初的结论是不完整的,并且有更多的测试数据显示了关于这三种方法有效性的真实故事。
import string
import timeit
import random
def stringcompress_original(str1):
res = []
d = dict.fromkeys(string.ascii_letters, 0)
main = str1[0]
for char in range(len(str1)):
if str1[char] == main:
d[main] += 1
else:
if d[main] == 1:
res.append(main)
d[main] = 0
main = str1[char]
d[main] += 1
else:
res.append(main + str(d[main]))
d[main] = 0
main = str1[char]
d[main] += 1
res.append(main + str(d[main]))
return min(''.join(res), str1, key=len)
def stringcompress_using_list(str1):
res = []
count = 0
for i in range(1, len(str1)):
count += 1
if str1[i] is str1[i-1]:
continue
res.append(str1[i-1])
res.append(str(count))
count = 0
res.append(str1[i] + str(count+1))
return min(''.join(res), str1, key=len)
def stringcompress_using_string(str1):
res = ''
count = 0
# we can start at 1 because we already know the first letter is not a repition of any previous letters
for i in range(1, len(str1)):
count += 1
# we keep going through the for loop, until a character does not repeat with the previous one
if str1[i] is str1[i-1]:
continue
# add the character along with the number of times it repeated to the final string
# reset the count
# and we start all over with the next character
res += str1[i-1] + str(count)
count = 0
# add the final character + count
res += str1[i] + str(count+1)
return min(res, str1, key=len)
def generate_test_data(min_length=3, max_length=300, iterations=3000, repeat_chance=.66):
assert repeat_chance > 0 and repeat_chance < 1
data = []
chr = 'a'
for i in range(iterations):
the_str = ''
# create a random string with a random length between min_length and max_length
for j in range( random.randrange(min_length, max_length+1) ):
# if we've decided to not repeat by randomization, then grab a new character,
# otherwise we will continue to use (repeat) the character that was chosen last time
if random.random() > repeat_chance:
chr = random.choice(string.ascii_letters)
the_str += chr
data.append(the_str)
return data
# generate test data beforehand to make sure all of our tests use the same test data
test_data = generate_test_data()
#make sure all of our test functions are doing the algorithm correctly
print('showing that the algorithms all produce the correct output')
print('stringcompress_original: ', stringcompress_original('aabcccccaaa'))
print('stringcompress_using_list: ', stringcompress_using_list('aabcccccaaa'))
print('stringcompress_using_string: ', stringcompress_using_string('aabcccccaaa'))
print()
print('stringcompress_original took', timeit.timeit("[stringcompress_original(x) for x in test_data]", number=10, globals=globals()), ' seconds' )
print('stringcompress_using_list took', timeit.timeit("[stringcompress_using_list(x) for x in test_data]", number=10, globals=globals()), ' seconds' )
print('stringcompress_using_string took', timeit.timeit("[stringcompress_using_string(x) for x in test_data]", number=10, globals=globals()), ' seconds' )
以下结果均采用 Intel i7-5700HQ CPU @ 2.70GHz 四核处理器。比较每个 blockquote 中的不同函数,但不要尝试交叉比较一个 blockquote 和另一个 blockquote 的结果,因为测试数据的大小会不同。
使用长字符串
使用generate_test_data(10000, 50000, 100, .66)生成的测试数据
stringcompress_original 耗时 7.346990528497378 秒
stringcompress_using_list 耗时 7.589927956366313 秒
stringcompress_using_string 耗时 7.713812443264496 秒
使用短字符串
使用generate_test_data(2, 5, 10000, .66)生成的测试数据
stringcompress_original 耗时 0.40272931026355685 秒
stringcompress_using_list 耗时 0.1525574881739265 秒
stringcompress_using_string 耗时 0.13842854253813164 秒
10% 的概率重复字符
使用generate_test_data(10, 300, 10000, .10)生成的测试数据
stringcompress_original 耗时 4.675965586924492 秒
stringcompress_using_list 耗时 6.081609410376534 秒
stringcompress_using_string 耗时 5.887430301813865 秒
90% 的概率重复字符
使用generate_test_data(10, 300, 10000, .90)生成的测试数据
stringcompress_original 耗时 2.6049783549783547 秒
stringcompress_using_list 耗时 1.9739111725413099 秒
stringcompress_using_string 耗时 1.9460854974553605 秒
创建一个像这样的小框架很重要,您可以使用它来测试对算法的更改。通常,看似无用的更改会使您的代码运行得更快,因此在优化性能时,游戏的关键是尝试不同的事情,并对结果进行计时。我敢肯定,如果您尝试进行不同的更改,可能会发现更多的发现,但这对于您想要优化的数据类型非常重要——压缩短字符串、长字符串和不重复的字符串经常与那些经常这样做的人相比。