【发布时间】:2022-01-20 16:50:36
【问题描述】:
尝试使用多处理将 CSV 文件读入 pandas 数据帧,但出现 pickle 错误。 蟒蛇 3.8.8 熊猫 1.2.4
import os
import pandas as PD
import time
from multiprocessing import Pool
def getExcelData(fn):
data = pd.DataFrame()
return data.append(pd.read_csv(fn), sort=False)
if __name__ == "__main__":
dir = '.'
fn_ls = [ f'{fn}' for fn in os.listdir(dir) if fn.endswith('test.csv') ]
startTime = time.time()
pool = Pool(2)
pool_data_list = []
data = pd.DataFrame()
for file_name in fn_ls:
pool_data_list.append(pool.apply_async(getExcelData, (os.path.join(dir, file_name),)))
pool.close()
pool.join()
for pool_data in pool_data_list:
data = data.append(pool_data.get())
res_ls = []
for pool_data in pool_data_list:
res_ls = pool_data.get()
endTime = time.time()
print(endTime - startTime)
print(len(data))
Traceback(最近一次调用最后一次):
文件“/Users/cxx/opt/anaconda3/lib/python3.8/site-packages/IPython/core/interactiveshell.py”,第 3437 行,在 run_code
执行(code_obj,self.user_global_ns,self.user_ns)
文件“”,第 1 行,在
runfile('/Users/cxx/xiaoxi/18_Mercury/raw_data/raw/5000bp/test/test.py', wdir='/Users/cxx/xiaoxi/18_Mercury/raw_data/raw/5000bp/test')
运行文件中的文件“/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py”,第 198 行
pydev_imports.execfile(filename, global_vars, local_vars) # 执行脚本
文件“/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py”,第 18 行,在 execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
文件“/Users/cxx/xiaoxi/18_Mercury/raw_data/raw/5000bp/test/test.py”,第 33 行,在
数据 = data.append(pool_data.get())
文件“/Users/cxx/opt/anaconda3/lib/python3.8/multiprocessing/pool.py”,第 771 行,在 get
提高自我价值
_handle_tasks 中的文件“/Users/cxx/opt/anaconda3/lib/python3.8/multiprocessing/pool.py”,第 537 行
放(任务)
文件“/Users/cxx/opt/anaconda3/lib/python3.8/multiprocessing/connection.py”,第 206 行,在发送中
self._send_bytes(_ForkingPickler.dumps(obj))
转储中的文件“/Users/cxx/opt/anaconda3/lib/python3.8/multiprocessing/reduction.py”,第 51 行
cls(buf, 协议).dump(obj)
_pickle.PicklingError: Can't pickle
【问题讨论】:
标签: python-3.x pandas multiprocessing