【发布时间】:2021-04-07 01:44:38
【问题描述】:
我是 Dask 的新手,我正在尝试追加到 parquet 文件....但我的代码始终覆盖文件的内容?
有什么想法我在这里做错了吗?
print("Write dataframe 1...")
df = pd.DataFrame({'DeptId': [1, 2, 3], 'DName': ['Accounting', 'Sales', 'Finance'], 'DeptNo': [100, 200, 300]})
df.set_index(['DeptId'], inplace=True)
ddf = dd.from_pandas(df, chunksize=1000)
print(ddf.head(3))
file_name = 'C:/Temp/xxx'
ddf.to_parquet(path=file_name, engine="pyarrow")
print("\nAppend dataframe 2...")
df2 = pd.DataFrame({'DeptId': [4, 5, 6], 'DName': ['Engineering', 'Support', 'Consulting'],
'DeptNo': [400, 500, 600]})
df2.set_index(['DeptId'], inplace=True)
ddf2 = dd.from_pandas(df2, chunksize=1000)
print(ddf2.head(3))
ddf2.to_parquet(path=file_name, engine="pyarrow", ignore_divisions=True, append=True, overwrite=False)
print("\nResulting parquet file...")
ddf3 = dd.read_parquet(path=file_name, engine="pyarrow")
print(ddf3.head())
输出如下...
- 写入数据帧 1...
DName DeptNo
DeptId
1 Accounting 100
2 Sales 200
3 Finance 300
- 附加数据帧 2...
DName DeptNo
DeptId
4 Engineering 400
5 Support 500
6 Consulting 600
- 生成的镶木地板文件...
DName DeptNo
DeptId
4 Engineering 400
5 Support 500
6 Consulting 600
- 我正在使用这个版本
python 3.8.8
dask 2020.3.1
pandas 1.2.3
pyarrow 3.0.0
问候
马克R
【问题讨论】:
标签: python dask dask-dataframe