【问题标题】:How to compare two CSV files and get the difference?如何比较两个 CSV 文件并获得差异?
【发布时间】:2021-02-16 07:58:37
【问题描述】:

我有两个 CSV 文件,

a1.csv

city,state,link
Aguila,Arizona,https://www.glendaleaz.com/planning/documents/AppendixAZONING.pdf
AkChin,Arizona,http://www.maricopa-az.gov/zoningcode/wp-content/uploads/2014/05/Zoning-Code-Rewrite-Public-Review-Draft-3-Tracked-Edits-lowres1.pdf
Aguila,Arizona,http://www.co.apache.az.us/planning-and-zoning-division/zoning-ordinances/

a2.csv

city,state,link
Aguila,Arizona,http://www.co.apache.az.us

我想有所作为。

这是我的尝试:

import pandas as pd

a = pd.read_csv('a1.csv')
b = pd.read_csv('a2.csv')

mask = a.isin(b.to_dict(orient='list'))
# Reverse the mask and remove null rows.
# Upside is that index of original rows that
# are now gone are preserved (see result).
c = a[~mask].dropna()
print c

预期输出:

city,state,link
Aguila,Arizona,https://www.glendaleaz.com/planning/documents/AppendixAZONING.pdf
AkChin,Arizona,http://www.maricopa-az.gov/zoningcode/wp-content/uploads/2014/05/Zoning-Code-Rewrite-Public-Review-Draft-3-Tracked-Edits-lowres1.pdf

但我收到一个错误:

Empty DataFrame
Columns: [city, state, link]
Index: []**

我想根据前两行检查,如果它们相同,则将其删除。

【问题讨论】:

    标签: python pandas csv


    【解决方案1】:

    您可以使用pandas 读取两个文件,将它们连接起来并删除所有重复的行:

    import pandas as pd
    a = pd.read_csv('a1.csv')
    b = pd.read_csv('a2.csv')
    ab = pd.concat([a,b], axis=0)
    ab.drop_duplicates(keep=False)
    

    参考:https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop_duplicates.html

    【讨论】:

      【解决方案2】:

      首先,连接 DataFrame,然后删除重复项,同时仍保留第一个。然后重置索引以保持一致。

      import pandas as pd
      
      a = pd.read_csv('a1.csv')
      b = pd.read_csv('a2.csv')
      c = pd.concat([a,b], axis=0)
      
      c.drop_duplicates(keep='first', inplace=True) # Set keep to False if you don't want any
                                                    # of the duplicates at all
      c.reset_index(drop=True, inplace=True)
      print(c)
      

      【讨论】:

        猜你喜欢
        • 2014-06-08
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2013-06-17
        相关资源
        最近更新 更多