【发布时间】:2013-08-12 15:03:34
【问题描述】:
我有一个巨大的文本文件(1 GB),其中每个“行”都符合语法:
[number] [number]_[number]
例如:
123 123_1234
45 456_45 12 12_12
我收到以下错误:
line 46, in open_delimited
pieces = re.findall(r"(\d+)\s+(\d+_\d+)", remainder + chunk, re.IGNORECASE)
TypeError: can only concatenate tuple (not "str") to tuple
关于此代码:
def open_delimited(filename, args):
with open(filename, args, encoding="UTF-16") as infile:
chunksize = 10000
remainder = ''
for chunk in iter(lambda: infile.read(chunksize), ''):
pieces = re.findall(r"(\d+)\s+(\d+_\d+)", remainder + chunk, re.IGNORECASE)
for piece in pieces[:-1]:
yield piece
remainder = pieces[-1]
if remainder:
yield remainder
filename = 'data/AllData_2000001_3000000.txt'
for chunk in open_delimited(filename, 'r'):
print(chunk)
【问题讨论】:
-
Remainder 是循环中第二次迭代中的元组而不是字符串
-
如果
chunk是部分记录,您的代码也会失败。在这种情况下,您没有匹配项。最好将它添加到remainder,拆分然后尝试在适当的地方拆分它。