【问题标题】:How to check if a row from a text file exists in another text file using awk如何使用awk检查文本文件中的行是否存在于另一个文本文件中
【发布时间】:2014-08-11 11:07:06
【问题描述】:

这是来自Unexpected result comparing values of rows and columns in two text files的后续问题

我创建了一个结构来根据它们的行和列比较两个文本文件。以下是文件结构:

file1.txt

Name  Col1  Col2  Col3 
-----------------------
row1  1     4     7         
row2  2     5     8          
row3  3     6     9

file2.txt

Name  Col1  Col2  Col3   
-----------------------         
row1  1     4     7 
row2  2     5     999

这是我目前的代码:

dos2unix ravi   # 2>/dev/null
dos2unix ravi2  # 2>/dev/null

awk '     
    FNR < 2 {next}       # skips first two lines
    FNR == NR {           
        for (i = 2; i <= NF; i++) {
            a[i,$1] = $i;               
        }    
        b[$1];               
        next;                       
    }

    ($1 in b) {                   # check if row in file2 existed in file1
        for (i = 2; i <= NF; i++) {
            if (a[i,$1] == $i) 
                printf("%s->col%d: %s vs %s: Are Equal\n", $1, i-1, a[i,$1], $i);
            else 
                printf("%s->col%d: %s vs %s: Not Equal\n", $1, i-1, a[i,$1], $i);
        }
    }

    !($1 in b) {                  # check if row in file2 doesn't exist in file1. 
        for (i = 2; i <= NF; i++) 
            printf("%s->col%d: %s vs %s: Are Not Equal\n", $1, i-1, "blank", $i);
    }

    // pattern needed to check if row in file1 doesn't exist in file2. 

    ' $PWD/file1.txt $PWD/file2.txt

有没有人有任何提示、建议或提示在awk 语句中使用模式来检查file1 中的行是否在file2 中不存在。请参阅下面的示例输出以了解我的意思。 (即:基本上,我想打印file1中row3的值,file2中不存在)。谢谢!如果需要进一步解释,请告诉我。

期望的输出:

row2->Col1: 1 vs 1: Equal
row2->Col2: 4 vs 4: Equal
row2->Col3: 7 vs 7: Equal
row1->Col1: 2 vs 2: Equal
row1->Col2: 5 vs 5: Equal
row1->Col3: 8 vs 999: Not Equal
row3->Col1: 3 vs (blank) : Not Equal
row3->Col2: 6 vs (blank) : Not Equal
row3->Col3: 9 vs (blank) : Not Equal

实际输出:

row2->Col1: 1 vs 1: Equal
row2->Col2: 4 vs 4: Equal
row2->Col3: 7 vs 7: Equal
row1->Col1: 2 vs 2: Equal
row1->Col2: 5 vs 5: Equal
row1->Col3: 8 vs 999: Not Equal

【问题讨论】:

  • 您可能应该为此使用一个小 Python 脚本,但这只是我的两分钱。

标签: linux bash awk scripting suse


【解决方案1】:

如果您知道每个文件中的每一行“名称”(第一列)最多出现一次,那么您可以在($1 in b) 块的末尾delete b[$1],将!($1 in b) 块移到它上面,然后然后添加一个END 块,循环遍历b 中留下的所有内容并打印出你的行。

END {
    for (r in b) {
        for (i = 2; i <= NF; i++) {
            printf("%s->col%d: %s vs %s: Are Not Equal\n", r, i-1, $i, "blank");
        }
    }
}

【讨论】:

    【解决方案2】:

    扩展你的答案:

    $ cat script.awk 
    FNR < 2 { next }       # skips first two lines
    FNR == NR {
        for (i = 2; i <= NF; i++) { a[i,$1] = $i }
        b[$1];
        next;
    }
    ($1 in b) {                   # check if row in file2 existed in file1
        for (i = 2; i <= NF; i++) {
            if (a[i,$1] == $i)
                printf("%s->col%d: %s vs %s: Are Equal\n", $1, i-1, a[i,$1], $i);
            else
                printf("%s->col%d: %s vs %s: Not Equal\n", $1, i-1, a[i,$1], $i);
        }
        delete b[$1];   # delete entries which are processed
    }
    
    END {
        for (left in b) {   # look which didn't match
            for (i = 2; i <= NF; i++) 
                printf("%s->col%d: %s vs (blank): Not Equal\n", left, i-1, a[i,left])
        }
    }
    

    像这样运行它:

    $ awk -f script.awk file1 file2
    row1->col1: 1 vs 1: Are Equal
    row1->col2: 4 vs 4: Are Equal
    row1->col3: 7 vs 7: Are Equal
    row2->col1: 2 vs 2: Are Equal
    row2->col2: 5 vs 5: Are Equal
    row2->col3: 8 vs 999: Not Equal
    row3->col1: 3 vs (blank): Not Equal
    row3->col2: 6 vs (blank): Not Equal
    row3->col3: 9 vs (blank): Not Equal
    

    【讨论】:

    • @jaypal 我感谢您花时间编写此代码(所以 +1),但它仍然没有为任何列打印 row3
    • @Nosscire 确保它们没有任何控制字符。我刚刚对此进行了测试,它适用于您给定的数据。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2017-10-02
    • 1970-01-01
    • 2020-02-04
    • 2011-02-12
    • 1970-01-01
    • 2012-06-03
    • 1970-01-01
    相关资源
    最近更新 更多