【发布时间】:2016-12-21 16:32:14
【问题描述】:
我有单独的电子表格,其中包含一年中每个月的数据 - 总共 12 个电子表格。每个工作簿包含 200k-500k 行。
例如
一月
| name | course | grade |
|-------|---------|-------|
| dave | math | 90 |
| chris | math | 80 |
| dave | english | 75 |
二月
| name | course | grade |
|-------|---------|-------|
| dave | science | 72 |
| chris | art | 58 |
| dave | music | 62 |
我正在使用 openpyxl 打开每个月度工作簿,遍历每一行和每个单元格,并将相关数据写入个人工作簿。即所有属于 Chris 的行都进入“Chris.xlsx”,所有属于 Dave 的行都进入“Dave.xlsx”。
我遇到的问题是 openpyxl 非常慢。我确信这是因为我的代码非常程序化,没有优化迭代和写作。
任何想法将不胜感激。
def appendToWorkbooks():
print("Appending workbooks")
je_dump_path = "C:/test/"
# define list of files in path
je_dump_files = os.listdir( je_dump_path )
# define path for resultant files
results_path = "C:/test/output/"
max_row = 0
input_row = 1
for file in je_dump_files:
current_row = 1
# load each workbook in the directory
load_file = je_dump_path + file
print("Loading workbook: " + file)
wb = load_workbook(filename=load_file, read_only=True)
print("Loaded workbook: " + file)
# select the worksheet with the name Sheet in each workbook
ws = wb['Sheet']
print("Loaded worksheet")
# iterate through the rows in the currently open workbook
for row in ws.iter_rows():
# determine the person this row of data relates to
person = ws.cell(row=current_row, column=1).value
# set output workbook to that person
output_wb_file = results_path + person + ".xlsx"
output_wb = load_workbook(output_wb_file)
output_ws = output_wb["Sheet"]
# increment the current row
current_row = current_row + 1
print("Currently on row: " + str(current_row))
# determine the last row in the current output workbook
max_row = output_ws.max_row
# set the output row to the row after the last row in the current output workbook
output_row = max_row + 1
for cell in row:
output_ws.cell(row=output_row, column=column_index_from_string(cell.column)).value = cell.value
output_wb.save(output_wb_file)
【问题讨论】:
-
@mike-müller 看到您在 stackoverflow.com/questions/35823835/… 上发布了类似的帖子