【发布时间】:2021-03-08 12:38:51
【问题描述】:
我找到了一些使用 tqdm 和 Python 多处理创建进度条的代码,它使用整数来更新进度条。我将其更改为使用文件循环,但 lambda 回调创建了一个带有文件路径的笛卡尔积,这让我的机器因大量文件而内存不足。我试图在其他问题中找到解决方案,但没有找到答案。 我该怎么做才能避免 async_result 中的笛卡尔积(以及内存不足),但仍会创建进度条?
import glob
import jpylyzer
import multiprocessing as mp
from tqdm import tqdm
cores=2
src="/path/to/jp2/files"
def f_process_file(filename):
now=time.strftime("%Y-%m-%d %H:%M:%S")
try:
result = jpylyzer.checkOneFile(filename)
status=result.findtext('isValid')
except Exception as ex:
print("oopsie")
return filename, status, now
# Find JP2 files in the source directory case insensitively
files = [f for f in glob.iglob(src + '/**/*.[jJ][pP]2', recursive=True)]
filecount=len(files)
# Start a multiprocessing pool
pool = mp.Pool(processes = cores)
# Define a progress bar
pbar = tqdm(total=filecount)
# process all files asynchronously and do callback for the progress bar
async_result = [pool.map_async(f_process_file, files, callback=lambda _: pbar.update(1)) for file in files]
# magic for the progress barr
results = [p.get() for p in async_result]
pool.close()
pool.join()
for i in range(len(results)):
if results[i][i][1] != 'True':
print(results[i][i])
【问题讨论】:
标签: python-3.x multiprocessing progress-bar cartesian-product tqdm