【发布时间】:2021-10-10 19:32:16
【问题描述】:
我的代码如下:
import pandas as pd
import numpy as np
df = pd.read_csv("path/to/my/infile.csv")
df = df.sort_values(['distance', 'time'])
df.to_csv("path/to/my/outfile.csv")
此代码成功地从一个 3GB 的 csv 文件中读取 infile.csv,对其进行排序并在尝试写入 outfile.csv 时失败,并出现以下错误:
OSError Traceback (most recent call last)
<ipython-input-10-3a5c8279658d> in <module>
----> 1 df.to_csv('/Users/joaomatos/Desktop/cluster22_sorted_training.csv',index=False)
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/frame.py in to_csv(self, path_or_buf, sep, na_rep, float_format, columns, header, index, index_label, mode, encoding, compression, quoting, quotechar, line_terminator, chunksize, tupleize_cols, date_format, doublequote, escapechar, decimal)
1743 doublequote=doublequote,
1744 escapechar=escapechar, decimal=decimal)
-> 1745 formatter.save()
1746
1747 if path_or_buf is None:
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/formats/csvs.py in save(self)
164 encoding=encoding,
165 compression=self.compression)
--> 166 f.write(buf)
167 f.close()
168 for _fh in handles:
OSError: [Errno 22] Invalid argument
我的问题是为什么?
感谢您的帮助
【问题讨论】:
-
我认为缺少一个“by”:df.sort_values(by=['distance', 'time'])
-
排序工作正常
-
你试过只写几行吗?
df.head().to_csv("path/to/my/outfile.csv") -
是的,它有效,这表明它可能与文件大小有关
-
也许 chunksize 会帮助你:
df.head().to_csv("path/to/my/outfile.csv", chunksize=10000)