【发布时间】:2017-10-10 08:36:23
【问题描述】:
我有一个包含 500 万行的 csv 文件。 我想将文件拆分为用户指定的行数。
已开发以下代码,但执行时间过长。谁能帮我优化代码。
import csv
print "Please delete the previous created files. If any."
filepath = raw_input("Enter the File path: ")
line_count = 0
filenum = 1
try:
in_file = raw_input("Enter Input File name: ")
if in_file[-4:] == ".csv":
split_size = int(raw_input("Enter size: "))
print "Split Size ---", split_size
print in_file, " will split into", split_size, "rows per file named as OutPut-file_*.csv (* = 1,2,3 and so on)"
with open (in_file,'r') as file1:
row_count = 0
reader = csv.reader(file1)
for line in file1:
#print line
with open(filepath + "\\OutPut-file_" +str(filenum) + ".csv", "a") as out_file:
if row_count < split_size:
out_file.write(line)
row_count = row_count +1
else:
filenum = filenum + 1
row_count = 0
line_count = line_count+1
print "Total Files Written --", filenum
else:
print "Please enter the Name of the file correctly."
except IOError as e:
print "Oops..! Please Enter correct file path values", e
except ValueError:
print "Oops..! Please Enter correct values"
我也试过不带"with open"
【问题讨论】:
-
比十万更传统的单位怎么样?;)
-
用不同的文件指针寻找不同的点并通过co-routine/gevent并行使用它们呢?
-
我还没有尝试过。你能帮忙吗?多线程或多任务在这里会有所帮助。
-
由于某种原因,您无法删除您的印度语单词?
-
@JamesZ 印度语之类的??