【发布时间】:2014-12-23 11:56:36
【问题描述】:
我有数百个大型 CSV 文件,我想将它们合并为一个。但是,并非所有 CSV 文件都包含所有列。因此,我需要根据列名而不是列位置进行合并。
在合并的 CSV 中,来自没有该单元格列的行的单元格的值应该为空。
我不能使用 pandas 模块,因为它让我内存不足。
有没有可以做到这一点的模块,或者一些简单的代码?
我在下面提供生成 2 个 csv 文件的代码。我想要的是合并 tempdf1.csv 和 tempdf2.csv,让我得到 tempdf3.csv。
import pandas as pd
df1=pd.DataFrame([{"Location":"A","Temperature":20,"Weather":"Fair", "Wind":"", "Latitude":44},{"Location":"B","Temperature":"","Weather":"Bad","Wind":"","Latitude":42}])
df2=pd.DataFrame([{"Location":"C","Temperature":14,"Weather":"","Longitude":12, "Wind":44},{"Location":"D","Temperature":"","Weather":"","Wind":0,"Longitude":11}])
df1.to_csv("C:/tempdf1.csv")
df2.to_csv("C:/tempdf2.csv")
df3=pd.DataFrame([{"Location":"A","Longitude":"","Temperature":20,"Weather":"Fair", "Wind":"", "Latitude":44},{"Location":"B","Longitude":"","Temperature":"","Weather":"Bad","Wind":"","Latitude":42},{"Location":"C","Temperature":14,"Weather":"","Longitude":12, "Wind":44,"Latitude":""},{"Location":"D","Temperature":"","Weather":"","Wind":0,"Longitude":11, "Latitude":""}])
df3.to_csv("C:/tempdf3.csv")
【问题讨论】: