【问题标题】:python3.8 multiprocessing Pool Can't pickle function:使用pandas Dataframe时__main__上的属性查找getExcelData失败
【发布时间】:2022-01-20 16:50:36
【问题描述】:

尝试使用多处理将 CSV 文件读入 pandas 数据帧,但出现 pickle 错误。 蟒蛇 3.8.8 熊猫 1.2.4

import os
import pandas as PD
import time
from multiprocessing import Pool

def getExcelData(fn):
    data = pd.DataFrame()
    return data.append(pd.read_csv(fn), sort=False)

if __name__ == "__main__":
    dir = '.'
    fn_ls = [ f'{fn}' for fn in os.listdir(dir) if fn.endswith('test.csv') ]
    startTime = time.time()

    pool = Pool(2)
    pool_data_list = []
    data = pd.DataFrame()
    for file_name in fn_ls:
        pool_data_list.append(pool.apply_async(getExcelData, (os.path.join(dir, file_name),)))

    pool.close()
    pool.join()

    for pool_data in pool_data_list:
        data = data.append(pool_data.get())
    res_ls = []
    for pool_data in pool_data_list:
        res_ls = pool_data.get()
    endTime = time.time()
    print(endTime - startTime)
    print(len(data))

Traceback(最近一次调用最后一次): 文件“/Users/cxx/opt/anaconda3/lib/python3.8/site-packages/IPython/core/interactiveshell.py”,第 3437 行,在 run_code 执行(code_obj,self.user_global_ns,self.user_ns) 文件“”,第 1 行,在 runfile('/Users/cxx/xiaoxi/18_Mercury/raw_data/raw/5000bp/test/test.py', wdir='/Users/cxx/xiaoxi/18_Mercury/raw_data/raw/5000bp/test') 运行文件中的文件“/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py”,第 198 行 pydev_imports.execfile(filename, global_vars, local_vars) # 执行脚本 文件“/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py”,第 18 行,在 execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) 文件“/Users/cxx/xiaoxi/18_Mercury/raw_data/raw/5000bp/test/test.py”,第 33 行,在 数据 = data.append(pool_data.get()) 文件“/Users/cxx/opt/anaconda3/lib/python3.8/multiprocessing/pool.py”,第 771 行,在 get 提高自我价值 _handle_tasks 中的文件“/Users/cxx/opt/anaconda3/lib/python3.8/multiprocessing/pool.py”,第 537 行 放(任务) 文件“/Users/cxx/opt/anaconda3/lib/python3.8/multiprocessing/connection.py”,第 206 行,在发送中 self._send_bytes(_ForkingPickler.dumps(obj)) 转储中的文件“/Users/cxx/opt/anaconda3/lib/python3.8/multiprocessing/reduction.py”,第 51 行 cls(buf, 协议).dump(obj) _pickle.PicklingError: Can't pickle : attribute lookup getExcelData on ma​​in failed

【问题讨论】:

    标签: python-3.x pandas multiprocessing


    【解决方案1】:

    用上下文管理器中的简单映射调用替换 startTimeendTime 之间的所有内容:

    with Pool(2) as pool: 
        data = [df for df in pool.imap(getExcelData, fn_ls)]
    

    【讨论】:

      猜你喜欢
      • 2017-05-11
      • 2011-12-13
      • 1970-01-01
      • 2018-11-19
      • 2020-08-18
      • 2021-12-30
      • 2019-12-10
      • 1970-01-01
      • 2023-01-11
      相关资源
      最近更新 更多