一种有效的方法是您的第一种方法,只需稍微更改使用with 语句打开文件,示例 -
with open("foo.txt", "r") as f:
for line in f:
for i in line.split():
if i.isdigit():
my_list.append(int(i))
通过与其他方法比较完成的计时测试 -
函数-
def func1():
my_list = []
for line in open("foo.txt", "r"):
for i in line.strip().split(' '):
if i.isdigit():
my_list.append(int(i))
return my_list
def func1_1():
return [int(i) for line in open("foo.txt", "r") for i in line.strip().split(' ') if i.isdigit()]
def func1_3():
my_list = []
with open("foo.txt", "r") as f:
for line in f:
for i in line.split():
if i.isdigit():
my_list.append(int(i))
return my_list
def func2():
my_list = []
for line in open("foo.txt", "r"):
for i in line.split():
try:
my_list.append(int(i))
except ValueError:
pass
return my_list
def func3():
my_list = []
with open("foo.txt","r") as f:
cf = csv.reader(f, delimiter=' ')
for row in cf:
my_list.extend([int(i) for i in row if i.isdigit()])
return my_list
计时测试结果 -
In [25]: timeit func1()
The slowest run took 4.70 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 204 µs per loop
In [26]: timeit func1_1()
The slowest run took 4.39 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 207 µs per loop
In [27]: timeit func1_3()
The slowest run took 5.46 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 191 µs per loop
In [28]: timeit func2()
The slowest run took 4.09 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 212 µs per loop
In [34]: timeit func3()
The slowest run took 4.38 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 202 µs per loop
鉴于将数据存储到列表中的方法,我相信上面的func1_3() 是最快的(如timeit所示)。
但是考虑到这一点,如果您真的要处理非常大的文件,那么最好使用生成器而不是将完整列表存储在内存中。
更新:正如 cmets 中所说,func2() 比 func1_3() 快(尽管在我的系统上它从来没有比 func1_3() 快,即使是整数),已更新foo.txt 包含数字以外的内容并进行计时测试 -
foo.txt
1 2 10 11
asd dd
dds asda
22 44 32 11 23
dd dsa dds
21 12
12
33
45
dds
asdas
dasdasd dasd das d asda sda
测试-
In [13]: %timeit func1_3()
The slowest run took 6.17 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 210 µs per loop
In [14]: %timeit func2()
1000 loops, best of 3: 279 µs per loop
In [15]: %timeit func1_3()
1000 loops, best of 3: 213 µs per loop
In [16]: %timeit func2()
1000 loops, best of 3: 273 µs per loop