您可以使用re。
In [2]: GENERIC_TEXT_STORE = ['this is good', 'this is a test', 'that\'s not a test']
In [3]: def count_occurrences(string):
...: count = 0
...: for text in GENERIC_TEXT_STORE:
...: count += text.count(string)
...: return count
In [6]: import re
In [7]: def count(_str):
...: return len(re.findall(_str,''.join(GENERIC_TEXT_STORE)))
...:
In [28]: def count1(_str):
...: return ' '.join(GENERIC_TEXT_STORE).count(_str)
...:
现在使用timeit来分析执行时间。
当GENERIC_TEXT_STORE 的大小为3 时。
In [9]: timeit count('this')
1.27 µs ± 57.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [10]: timeit count_occurrences('this')
697 ns ± 25.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [33]: timeit count1('this')
385 ns ± 22.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
当GENERIC_TEXT_STORE 的大小为15000 时。
In [17]: timeit count('this')
1.07 ms ± 118 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [18]: timeit count_occurrences('this')
3.35 ms ± 279 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [37]: timeit count1('this')
275 µs ± 18.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
当GENERIC_TEXT_STORE 的大小为150000时
In [20]: timeit count('this')
5.7 ms ± 2.39 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [21]: timeit count_occurrences('this')
33 ms ± 3.26 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [40]: timeit count1('this')
3.98 ms ± 211 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
当GENERIC_TEXT_STORE 的大小超过一百万时 (1500000)
In [23]: timeit count('this')
50.3 ms ± 7.21 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [24]: timeit count_occurrences('this')
283 ms ± 12.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [43]: timeit count1('this')
40.7 ms ± 1.09 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
count1
当GENERIC_TEXT_STORE 的大小很大时,count 和count1 几乎比count_occurrences 快4 到5 倍。