【问题标题】:how to compare two files and print mismatched line number in python?如何比较两个文件并在python中打印不匹配的行号?
【发布时间】:2014-01-08 07:34:05
【问题描述】:

我有两个包含相同行数的文件。

"file1.txt" contains following lines:

 Attitude is a little thing that makes a big difference
 The only disability in life is a bad attitude
 Abundance is, in large part, an attitude
 Smile when it hurts most

"file2.txt" contains:

 Attitude is a little thing that makes a big difference
 Everyone has his burden. What counts is how you carry it
 Abundance is, in large part, an attitude
 A positive attitude may not solve all your problems  

我想逐行比较两个文件,如果两个文件之间的任何行不匹配,我想

 print "mismatch in line no: 2"
 print "mismatch in line no: 4"   #in this case lineno: 2 and lineno: 4 varies from second file

我试过了。但是我只能打印 file1 中与 file2 中的行不同的行。无法打印不匹配行的行号。??

 My code:
 with open("file1.txt") as f1:
    lineset = set(f1)
 with open("file2.txt") as f2:
    lineset.difference_update(f2)
    for line in lineset:
        print line

【问题讨论】:

  • 你为什么要把它做成一套?是否要删除重复项?
  • 不,我不想删除该行。我想打印与 file2 行不匹配的 file1 的行号。在我的情况下,第 2 行和第 4 行与 file2 不同。所以我想打印不匹配在第 2 行和第 4 行
  • 你听说过diff吗?这有点重新发明轮子..

标签: python python-2.7 file-comparison


【解决方案1】:
import itertools

with open('file1.txt') as f1, open('file2.txt') as f2:
    for lineno, (line1, line2) in enumerate(zip(f1, f2), 1):
        if line1 != line2:
            print ('mismatch in line no:', lineno)

【讨论】:

  • 请在回答时格式化您的代码,我这次更正了。
  • 为了让你的答案是一个好的答案,你应该向请求者解释你的代码,让每个人都能理解和学习,而不是转储代码。谢谢
【解决方案2】:

您也许可以使用difflib 模块。这是一个使用其difflib.Differ 类的简单示例:

import difflib
import sys

with open('file1.txt') as file1, open('file2.txt') as file2:
    line_formatter = '{:3d}  {}'.format
    file1_lines = [line_formatter(i, line) for i, line in enumerate(file1, 1)]
    file2_lines = [line_formatter(i, line) for i, line in enumerate(file2, 1)]
    results = difflib.Differ().compare(file1_lines, file2_lines)
    sys.stdout.writelines(results)

输出:

    1  Attitude is a little thing that makes a big difference
-   2  The only disability in life is a bad attitude
+   2  Everyone has his burden. What counts is how you carry it
    3  Abundance is, in large part, an attitude
-   4  Smile when it hurts most
+   4  A positive attitude may not solve all your problems

第一列中的减号和加号表示以典型的diff 实用程序样式替换的行。没有任何指示符意味着这两个文件中的行是相同的 - 如果您愿意,您可以禁止打印这些行,但为了使示例简单,compare() 方法创建的所有内容都将被打印。

作为参考,以下是两个文件的内容并排显示:

1  Attitude is a little thing that makes a big difference    Attitude is a little thing that makes a big difference
2  The only disability in life is a bad attitude             Everyone has his burden. What counts is how you carry it
3  Abundance is, in large part, an attitude                  Abundance is, in large part, an attitude
4  Smile when it hurts most                                  A positive attitude may not solve all your problems

【讨论】:

    【解决方案3】:

    如果:

    with open("file1.txt") as f1:
        with open("file2.txt") as f2:
            for idx, (lineA, lineB) in enumerate(zip(f1, f2)):
                if lineA != lineB:
                    print 'mismatch in line no: {0}'.format(idx)
    

    或者如果有不同的行数你可以试试izip_longest

    import itertools
    
    with open("file1.txt") as f1:
        with open("file2.txt") as f2:
            for idx, (lineA, lineB) in enumerate(itertools.izip_longest(f1, f2)):
                if lineA != lineB:
                    print 'mismatch in line no: {0}'.format(idx)
    

    【讨论】:

      【解决方案4】:

      使用itertools.izipenumerate

      import itertools
      
      with open('file1.txt') as f1, open('file2.txt') as f2:
          for lineno, (line1, line2) in enumerate(itertools.izip(f1, f2), 1):
              if line1 != line2:
                  print 'mismatch in line no:', lineno
      

      【讨论】:

      • 对于 python 3,使用 "zip" 而不是 "itertools.izip" 和 print('mismatch in line no:', lineno)
      猜你喜欢
      • 2017-07-30
      • 1970-01-01
      • 2016-10-23
      • 2017-07-07
      • 2020-11-24
      • 2016-07-30
      • 2012-09-05
      • 2017-10-29
      • 1970-01-01
      相关资源
      最近更新 更多