【问题标题】:Python: merge csv files with different column subsetsPython:将 csv 文件与不同的列子集合并
【发布时间】:2014-12-23 11:56:36
【问题描述】:

我有数百个大型 CSV 文件,我想将它们合并为一个。但是,并非所有 CSV 文件都包含所有列。因此,我需要根据列名而不是列位置进行合并。

在合并的 CSV 中,来自没有该单元格列的行的单元格的值应该为空。

我不能使用 pandas 模块,因为它让我内存不足。

有没有可以做到这一点的模块,或者一些简单的代码?

我在下面提供生成 2 个 csv 文件的代码。我想要的是合并 tempdf1.csv 和 tempdf2.csv,让我得到 tempdf3.csv。

import pandas as pd

df1=pd.DataFrame([{"Location":"A","Temperature":20,"Weather":"Fair", "Wind":"", "Latitude":44},{"Location":"B","Temperature":"","Weather":"Bad","Wind":"","Latitude":42}])
df2=pd.DataFrame([{"Location":"C","Temperature":14,"Weather":"","Longitude":12, "Wind":44},{"Location":"D","Temperature":"","Weather":"","Wind":0,"Longitude":11}])
df1.to_csv("C:/tempdf1.csv")
df2.to_csv("C:/tempdf2.csv")

df3=pd.DataFrame([{"Location":"A","Longitude":"","Temperature":20,"Weather":"Fair", "Wind":"", "Latitude":44},{"Location":"B","Longitude":"","Temperature":"","Weather":"Bad","Wind":"","Latitude":42},{"Location":"C","Temperature":14,"Weather":"","Longitude":12, "Wind":44,"Latitude":""},{"Location":"D","Temperature":"","Weather":"","Wind":0,"Longitude":11, "Latitude":""}])
df3.to_csv("C:/tempdf3.csv")

【问题讨论】:

    标签: python csv merge


    【解决方案1】:

    迟到总比没有好:) 看看convtools 库,它提供了大量的数据处理原语,是纯python 并且依赖于代码生成。 > Table processing docs <

    from convtools import conversion as c
    from convtools.contrib.tables import Table
    
    # into_* methods can only be called once, because it processes
    # a stream and cannot assume it can be read twice
    Table.from_csv("tempdf1.csv", header=True).chain(
        Table.from_csv("tempdf2.csv"), header=True
    ).into_csv("tempdf3.csv")
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2014-12-23
      • 1970-01-01
      • 2020-07-30
      • 2019-08-28
      • 2021-10-04
      • 2016-03-25
      • 2020-06-24
      相关资源
      最近更新 更多