删除文件中的匹配对答案

【问题标题】：Delete matched pairs in a file删除文件中的匹配对
【发布时间】：2012-12-20 09:46:23
【问题描述】：

为了寻找 C++ 代码中特别持久的内存泄漏，我决定将所有分配写入以下格式的日志文件：

<alloc|free> <address> <size> <UNIQUE-ID> <file> <line number>

这给了我，例如：

alloc 232108     60   405766 file1.cpp (3572)
free  232128     60   405766
alloc 232108     60   405767 file1.cpp (3572)
free  232128     60   405767
alloc 7a3620  12516   405768 file2.cpp (11435)
free  7a3640  12516   405768
alloc 2306c8    256   405769 file3.cpp (3646)
alloc 746160   6144   405770 file3.cpp (20462)
alloc 6f3528   2048   405771 file4.h (153)
alloc 6aca50    128   405772 file4.h (153)
alloc 632ec8    128   405773 file4.h (153)
alloc 732ff0    128   405774 file4.h (153)
free  746180   6144   405770
free  632ee8    128   405773
alloc 6a7610   2972   405778 this_alloc_has_no_counterpart.cpp (123)
free  6aca70    128   405772
free  733010    128   405774
free  6f3548   2048   405771
alloc 6a7610   2972   405775 file3.cpp (18043)
alloc 7a3620  12316   405776 file5.cpp (474)
alloc 631e00    256   405777 file3.cpp (18059)
free  7a3640  12316   405776
free  6a7630   2972   405775
free  631e20    256   405777
free  2306e8    256   405769

我正在尝试将每个alloc 与free 匹配，并且只留下没有free 对应项的allocs，例如分配号405778。

我能想到的是以下shell脚本：

#!/bin/sh
grep "^alloc" test.txt | while read line
do
    alloc_nr=`echo $line | awk '{ print $4 }'`  # arg4 = allocation number
    echo "Processing $alloc_nr"
    sed -i "/ ${alloc_nr}/{//d}" test.txt
done

正如您可能已经猜到的那样，这对于一个大约 144000 allocs 的 25MB 文件来说非常缓慢（即每秒 2 个循环），因为我以非常低效的方式使用 sed。

如果有人能在正确的方向上推动我如何在不花费三个小时的情况下实现这一目标，我们将不胜感激。

【问题讨论】：

标签： regex shell sed awk

【解决方案1】：

您似乎只想要 ID 而不是整行：

$ awk '{print $4}' file | sort | uniq -u
405778

awk '{print $4}' 仅打印 ID 列。

sort 对列进行排序。

uniq -u 仅显示唯一 ID。

编辑：

管道到grep -f - file以匹配整行，无需循环：

$ awk '{print $4}' file | sort | uniq -u | grep -f - file
alloc 6a7610   2972   405778 this_alloc_has_no_counterpart.cpp (123)

grep -f 匹配文件中的模式，- 表示使用 stdin。

【讨论】：

【解决方案2】：

awk '/^alloc/ { a[$4]=$0; }
     /^free/ { delete a[$4]; }
     END { for (i in a) {print a[i]; }' test.txt

【讨论】：