【发布时间】:2020-11-20 07:02:40
【问题描述】:
说,我有一个函数可以在一个列表中运行多个数据帧。像这样,
listdF = [os.path.join(os.sep,path,x) for x in os.listdir(path) if x.endswith('.csv')]
def corre_arrys(listdF):
data = []
for files in listdF:
df = pd.read_csv(files,sep='\t',header=0,engine='python')
#do something
return(df)
当我尝试按原样运行上述函数时,没有错误。它打印出我需要的输出。但是,当我尝试使用multiprocessing 运行它时,如下所示,
from multiprocessing import Pool
NUM_PROCS = 8
pool = Pool(processes=NUM_PROCS)
allDfs = pool.map(corre_arrys,listdF)
它正在抛出以下错误消息,
RemoteTraceback Traceback (most recent call last)
RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/alva/anaconda3/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/home/alva/anaconda3/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "<ipython-input-42-e4b97b52ffff>", line 4, in corre_arrys
df = pd.read_csv(files,sep='\t',header=0,engine='python')
File "/home/alva/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 676, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/alva/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 448, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "/home/alva/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 880, in __init__
self._make_engine(self.engine)
File "/home/alva/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 1126, in _make_engine
self._engine = klass(self.f, **self.options)
File "/home/alva/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 2269, in __init__
memory_map=self.memory_map,
File "/home/alva/anaconda3/lib/python3.7/site-packages/pandas/io/common.py", line 431, in get_handle
f = open(path_or_buf, mode, errors="replace", newline="")
IsADirectoryError: [Errno 21] Is a directory: '/'
"""
The above exception was the direct cause of the following exception:
IsADirectoryError Traceback (most recent call last)
<ipython-input-46-4971753cdf30> in <module>
4 NUM_PROCS = 8
5 pool = Pool(processes=NUM_PROCS)
----> 6 allDfs = pool.map(corre_arrys,listdF)
~/anaconda3/lib/python3.7/multiprocessing/pool.py in map(self, func, iterable, chunksize)
266 in a list that is returned.
267 '''
--> 268 return self._map_async(func, iterable, mapstar, chunksize).get()
269
270 def starmap(self, func, iterable, chunksize=None):
~/anaconda3/lib/python3.7/multiprocessing/pool.py in get(self, timeout)
655 return self._value
656 else:
--> 657 raise self._value
658
659 def _set(self, i, obj):
IsADirectoryError: [Errno 21] Is a directory: '/'
listDF 如下所示,既有路径又有文件。
['/path/scripts/pc_2_lc_1_T.csv',
'/path/scripts/pc_2_lc_2_T.csv',
'/path/scripts/pc_1_lc_1_T.csv',
'/path/scripts/pc_1_lc_2_T.csv']
我不明白问题出在哪里。
非常感谢任何帮助。谢谢!!
【问题讨论】:
-
“listDF”中的第一个路径是相对的。尽量避免这种情况。
-
@MichaelButscher,这是一个错字。实际上,所有路径都是绝对的
标签: python pandas multithreading list numpy