【问题标题】:Merging two csv files with a common column but uneven lengths合并两个具有公共列但长度不均匀的 csv 文件
【发布时间】:2016-06-29 08:41:58
【问题描述】:

我有两个 csv 文件: csv 文件 1 包含以下内容:

California,C1,G1,K1,Dine-In,B,25
California,C2,G2,K1,Dine-In,A,8
Hawaii,H1,J1,L1,Dine-In,A,22
Hawaii,H2,J2,L2,Dine-In,A,20

csv 文件 2 包含:

Hawaii,10
California,20

我希望我的输出是:

California,C1,G1,K1,Dine-In,B,25,20
California,C2,G2,K1,Dine-In,A,8,20
Hawaii,H1,J1,L1,Dine-In,A,22,10
Hawaii,H2,J2,L2,Dine-In,A,20,10

我已经完成了我的代码:

with open(r'file 1.csv', 'r') as f:
    r = csv.reader(f)
    dict2 = {row[0]: row[1:] for row in r}

with open(r'file 2.csv','r') as f:
    r = csv.reader(f)
    dict1 = OrderedDict((row[0], row[1:]) for row in r)

result = OrderedDict()
for d in (dict1, dict2):
    for key, value in d.iteritems():
        result.setdefault(key, []).extend(value)

with open('combined data.csv', 'wb') as f:
    w = csv.writer(f)
    for key, value in result.iteritems():
        w.writerow([key] + value)

但它给了我一个输出:

California,C1,G1,K1,Dine-In,B,25
California,C2,G2,K1,Dine-In,A,8
Hawaii,H1,J1,L1,Dine-In,A,22
Hawaii,H2,J2,L2,Dine-In,A,20
Hawaii,10
California,20

对此有什么想法吗?

【问题讨论】:

  • 像这样将file 1.csv 读入字典会丢弃带有重复键的行。

标签: python csv merge


【解决方案1】:

您只需要将file 2.csv作为字典加载,然后在读取file 1.csv时将其附加到每一行,如下所示:

import csv

with open(r'file 2.csv','rb') as f_file2:
    dict2 = {row[0]: row[1:] for row in csv.reader(f_file2)}

with open(r'file 1.csv', 'rb') as f_file1, open('combined data.csv', 'wb') as f_output:
    csv_output = csv.writer(f_output)

    for row in csv.reader(f_file1):
        csv_output.writerow(row + dict2[row[0]])

给你:

California,C1,G1,K1,Dine-In,B,25,20
California,C2,G2,K1,Dine-In,A,8,20
Hawaii,H1,J1,L1,Dine-In,A,22,10
Hawaii,H2,J2,L2,Dine-In,A,20,10

【讨论】:

    【解决方案2】:

    pandas 解决方案

    import pandas pd
    
    df1 = pd.read_csv('file1.csv', header=None)
    df2 = pd.read_csv('file2.csv', header=None)
    res = pd.merge(df1, df2, on=0)
    res.to_csv('combined.csv', header=None, index=False)
    

    combined.csv

    California,C1,G1,K1,Dine-In,B,25,20
    California,C2,G2,K1,Dine-In,A,8,20
    Hawaii,H1,J1,L1,Dine-In,A,22,10
    Hawaii,H2,J2,L2,Dine-In,A,20,10
    

    按步骤

    将第一个文件读入数据框:

    df1 = pd.read_csv('file1.csv', header=None)
    

    看起来像这样:

                0   1   2   3        4  5   6
    0  California  C1  G1  K1  Dine-In  B  25
    1  California  C2  G2  K1  Dine-In  A   8
    2      Hawaii  H1  J1  L1  Dine-In  A  22
    3      Hawaii  H2  J2  L2  Dine-In  A  20
    

    对第二个文件做同样的事情:

    df2 = pd.read_csv('file2.csv', header=None)
    

    结果:

                0   1
    0      Hawaii  10
    1  California  20
    

    0列合并:

    res = pd.merge(df1, df2, on=0)
    

    现在,res 看起来像这样:

                0 1_x   2   3        4  5   6  1_y
    0  California  C1  G1  K1  Dine-In  B  25   20
    1  California  C2  G2  K1  Dine-In  A   8   20
    2      Hawaii  H1  J1  L1  Dine-In  A  22   10
    3      Hawaii  H2  J2  L2  Dine-In  A  20   10
    

    最后,写入一个不带标题和索引的 csv 文件:

    res.to_csv('combined.csv', header=None, index=False)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2015-12-22
      • 1970-01-01
      • 2022-11-22
      • 2019-07-28
      • 2016-01-16
      • 1970-01-01
      • 1970-01-01
      • 2021-11-12
      相关资源
      最近更新 更多