【问题标题】:cartesian product in python multiprocessing with tqdm progress bar带有tqdm进度条的python多处理中的笛卡尔积
【发布时间】:2021-03-08 12:38:51
【问题描述】:

我找到了一些使用 tqdm 和 Python 多处理创建进度条的代码,它使用整数来更新进度条。我将其更改为使用文件循环,但 lambda 回调创建了一个带有文件路径的笛卡尔积,这让我的机器因大量文件而内存不足。我试图在其他问题中找到解决方案,但没有找到答案。 我该怎么做才能避免 async_result 中的笛卡尔积(以及内存不足),但仍会创建进度条?

import glob
import jpylyzer
import multiprocessing as mp
from tqdm import tqdm
cores=2
src="/path/to/jp2/files"

def f_process_file(filename):
  now=time.strftime("%Y-%m-%d %H:%M:%S")
  try:
    result = jpylyzer.checkOneFile(filename)
    status=result.findtext('isValid')
  except Exception as ex:
    print("oopsie")
  return filename, status, now

# Find JP2 files in the source directory case insensitively
files = [f for f in glob.iglob(src + '/**/*.[jJ][pP]2', recursive=True)]
filecount=len(files)

# Start a multiprocessing pool
pool =  mp.Pool(processes = cores)

# Define a progress bar
pbar = tqdm(total=filecount)

# process all files asynchronously and do callback for the progress bar
async_result = [pool.map_async(f_process_file, files, callback=lambda _: pbar.update(1)) for file in files]

# magic for the progress barr
results = [p.get() for p in async_result]

pool.close()
pool.join()

for i in range(len(results)):
  if results[i][i][1] != 'True':
    print(results[i][i])
    

【问题讨论】:

    标签: python-3.x multiprocessing progress-bar cartesian-product tqdm


    【解决方案1】:

    我通过从 async_result 中删除 []、删除 callback=lambda 并声明一个全局变量 找到了答案pbar 用于进度条,在动态启动之前

    #!/usr/bin/env python3
    import glob
    from tqdm import tqdm
    import time, sys
    
    
    def f_process_file(filename):
      fnctn=sys._getframe().f_code.co_name
      now=time.strftime("%Y-%m-%d %H:%M:%S")
      try:
        # Do some stuff here
        result = 'isValid' # for testing purpose declare a value
        status = 'True' # for testing purpose declare a value
      except Exception as ex:
        print("failure in {}".format(fnctn))
    
      #update the progress bar
      time.sleep(0.005)
      pbar.update(1)
      return filename, status, now
    
    def f_doall(src):
      files = [f for f in glob.iglob(src + '/**/*.[jJ][pP]2', recursive=True)]
      filecount=len(files)
      print(filecount)
      #Declare a global variable for the progress bar
      global pbar
      # Initiate the progree bar
      pbar  = tqdm(total=filecount)
    
      for f in files:
        f_process_file(f)
    
    def main():
      src="/path/to/images"
      
      f_doall(src)
    
    if __name__ == "__main__":
      main()
    

    现在我可以扩展我的代码以使用多处理池

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2019-04-20
      • 1970-01-01
      • 2017-08-30
      • 1970-01-01
      • 2018-10-13
      • 1970-01-01
      • 2021-05-18
      相关资源
      最近更新 更多