【问题标题】:Parsing an ASCII text file using GAWK使用 GAWK 解析 ASCII 文本文件
【发布时间】:2018-05-13 20:12:28
【问题描述】:

我一直在尝试解析以下格式的 ASCII 文本文件 --

0 0 0x2de0 [0x98]: PERF_RECORD_MMAP -1/0: [0xffffffffc06ae000(0x5000) @ 0]: x /lib/modules/4.4.0-83-generic/kernel/net/ipv4/netfilter/nf_reject_ipv4.ko

0x2e78 [0x90]: event: 1
.
. ... raw event: size 144 bytes
.  0000:  01 00 00 00 01 00 90 00 ff ff ff ff 00 00 00 00  ................
.  0010:  00 30 6b c0 ff ff ff ff 00 50 00 00 00 00 00 00  .0k......P......
.  0020:  00 00 00 00 00 00 00 00 2f 6c 69 62 2f 6d 6f 64  ......../lib/mod
.  0030:  75 6c 65 73 2f 34 2e 34 2e 30 2d 38 33 2d 67 65  ules/4.4.0-83-ge
.  0040:  6e 65 72 69 63 2f 6b 65 72 6e 65 6c 2f 6e 65 74  neric/kernel/net
.  0050:  2f 69 70 76 34 2f 6e 65 74 66 69 6c 74 65 72 2f  /ipv4/netfilter/
.  0060:  69 70 74 5f 52 45 4a 45 43 54 2e 6b 6f 00 2e 6b  ipt_REJECT.ko..k
.  0070:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
.  0080:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

0 0 0x2e78 [0x90]: PERF_RECORD_MMAP -1/0: [0xffffffffc06b3000(0x5000) @ 0]: x /lib/modules/4.4.0-83-generic/kernel/net/ipv4/netfilter/ipt_REJECT.ko

0x2f08 [0x88]: event: 1
.
. ... raw event: size 136 bytes
.  0000:  01 00 00 00 01 00 88 00 ff ff ff ff 00 00 00 00  ................
.  0010:  00 80 6b c0 ff ff ff ff 00 50 00 00 00 00 00 00  ..k......P......
.  0020:  00 00 00 00 00 00 00 00 2f 6c 69 62 2f 6d 6f 64  ......../lib/mod
.  0030:  75 6c 65 73 2f 34 2e 34 2e 30 2d 38 33 2d 67 65  ules/4.4.0-83-ge
.  0040:  6e 65 72 69 63 2f 6b 65 72 6e 65 6c 2f 6e 65 74  neric/kernel/net
.  0050:  2f 6e 65 74 66 69 6c 74 65 72 2f 78 74 5f 74 63  /netfilter/xt_tc
.  0060:  70 75 64 70 2e 6b 6f 00 00 00 00 00 00 00 00 00  pudp.ko.........
.  0070:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
.  0080:  00 00 00 00 00 00 00 00                      

    ........[some other data]........
0x11590 [0x30]: PERF_RECORD_AUXTRACE size: 0x2002a0  offset: 0  ref: 0x2d44e6441a3c2  idx: 0  tid: -1  cpu: 0
.
. ... Intel Processor Trace data: size 2097824 bytes
.  00000000:  02 82 02 82 02 82 02 82 02 82 02 82 02 82 02 82 PSB
.  00000010:  00 00 00                                        PAD
.  00000013:  99 20                                           MODE.TSX TXAbort:0 InTX:0
.  00000015:  99 01                                           MODE.Exec 64
.  00000017:  7d 08 45 06 81 ff ff 00                         FUP 0xffff81064508
.  0000001f:  00 00 00 00 00 00 00                            PAD
.  00000026:  02 43 00 76 49 1f 00 00                         PIP 0xfa4bb00 (NR=0)

.  0000002e:  00 00 00 00 00 00 00 00                         PAD
--- continued ---

该文件将有几个标题 - 正如您在我的 sn-p 中看到的那样。

PERF_RECORD_MMAPPERF_RECORD_AUXTRACE

文件中还会有其他标题。

我想要的是我的文本文件中所有具有PERF_RECORD_AUXTRACE 的标题都应该被考虑。我的文件中PERF_RECORD_AUXTRACE 后面的所有数据都应该只收集(即所有以英特尔处理器跟踪数据开头的数据)。 PERF_RECORD_AUXTRACE 标头还有一个 size 字段,我可以使用它指定要在 PERF_RECORD_AUXTRACE 标头中收集的数据量。

编辑#1

所以基本上,给定上述输入文件 sn-p,我希望输出为以下形式(记录后的所有行都包含 PERF_RECORD_AUXTRACE)...

.
. ... Intel Processor Trace data: size 2097824 bytes
.  00000000:  02 82 02 82 02 82 02 82 02 82 02 82 02 82 02 82 PSB
.  00000010:  00 00 00                                        PAD
.  00000013:  99 20                                           MODE.TSX TXAbort:0 InTX:0
.  00000015:  99 01                                           MODE.Exec 64
.  00000017:  7d 08 45 06 81 ff ff 00                         FUP 0xffff81064508
.  0000001f:  00 00 00 00 00 00 00                            PAD
.  00000026:  02 43 00 76 49 1f 00 00                         PIP 0xfa4bb00 (NR=0)

.  0000002e:  00 00 00 00 00 00 00 00                         PAD
--- continued ---

EDIT #2:这是我的另一个要求 --

如果我有一个像下面这样的输入 sn-p --

0 0 0x230 [0x60]: PERF_RECORD_MMAP -1/0: [0xffffffff81000000(0x3f000000) @ 0xffffffff81000000]: x [kernel.kallsyms]_text

0x290 [0x88]: event: 1
.
. ... raw event: size 136 bytes
.  0000:  01 00 00 00 01 00 88 00 ff ff ff ff 00 00 00 00  ................
.  0010:  00 00 00 c0 ff ff ff ff 00 90 00 00 00 00 00 00  ................
.  0020:  00 00 00 00 00 00 00 00 2f 6c 69 62 2f 6d 6f 64  ......../lib/mod
.  0030:  75 6c 65 73 2f 34 2e 34 2e 30 2d 38 33 2d 67 65  ules/4.4.0-83-ge
.  0040:  6e 65 72 69 63 2f 6b 65 72 6e 65 6c 2f 64 72 69  neric/kernel/dri
.  0050:  76 65 72 73 2f 61 74 61 2f 6c 69 62 61 68 63 69  vers/ata/libahci
.  0060:  2e 6b 6f 00 00 00 00 00 00 00 00 00 00 00 00 00  .ko.............
.  0070:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
.  0080:  00 00 00 00 00 00 00 00                          ........

0x11590 [0x30]: PERF_RECORD_AUXTRACE size: 0x2002a0  offset: 0  ref: 0x2d44e6441a3c2  idx: 0  tid: -1  cpu: 0
.
. ... Intel Processor Trace data: size 2097824 bytes
.  00000000:  02 82 02 82 02 82 02 82 02 82 02 82 02 82 02 82 PSB
.  00000010:  00 00 00                                        PAD
.  00000013:  99 20                                           MODE.TSX TXAbort:0 InTX:0
.  00000015:  99 01                                           MODE.Exec 64
.  00000017:  7d 08 45 06 81 ff ff 00                         FUP 0xffff81064508
.  0000001f:  00 00 00 00 00 00 00                            PAD
.  00000026:  02 43 00 76 49 1f 00 00                         PIP 0xfa4bb00 (NR=0)
.  0000002e:  00 00 00 00 00 00 00 00                         PAD
.  00000036:  02 c8 c2 3a 7c 00 00 00                         VMCS 0x7c3ac2

0 0 0x290 [0x88]: PERF_RECORD_MMAP -1/0: [0xffffffffc0000000(0x9000) @ 0]: x /lib/modules/4.4.0-83-generic/kernel/drivers/ata/libahci.ko

0x318 [0x98]: event: 1
.
. ... raw event: size 152 bytes
.  0000:  01 00 00 00 01 00 98 00 ff ff ff ff 00 00 00 00  ................
.  0010:  00 90 00 c0 ff ff ff ff 00 50 00 00 00 00 00 00  .........P......
.  0020:  00 00 00 00 00 00 00 00 2f 6c 69 62 2f 6d 6f 64  ......../lib/mod
.  0030:  75 6c 65 73 2f 34 2e 34 2e 30 2d 38 33 2d 67 65  ules/4.4.0-83-ge
.  0040:  6e 65 72 69 63 2f 6b 65 72 6e 65 6c 2f 64 72 69  neric/kernel/dri
.  0050:  76 65 72 73 2f 76 69 64 65 6f 2f 66 62 64 65 76  vers/video/fbdev
.  0060:  2f 63 6f 72 65 2f 66 62 5f 73 79 73 5f 66 6f 70  /core/fb_sys_fop
.  0070:  73 2e 6b 6f 00 00 00 00 00 00 00 00 00 00 00 00  s.ko............
.  0080:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
.  0090:  00 00 00 00 00 00 00 00                          ........


0x11590 [0x30]: PERF_RECORD_AUXTRACE size: 0x2002a0  offset: 0  ref: 0x2d44e6441a3c2  idx: 0  tid: -1  cpu: 0
.
. ... Intel Processor Trace data: size 2097824 bytes
.  00000000:  02 82 02 82 02 82 02 82 02 82 02 82 02 82 02 82 PSB
.  00000010:  00 00 00                                        PAD
.  00000013:  99 20                                           MODE.TSX TXAbort:0 InTX:0
.  00000015:  99 01                                           MODE.Exec 64
.  00000017:  7d 08 45 06 81 ff ff 00                         FUP 0xffff81064508
.  0000001f:  00 00 00 00 00 00 00                            PAD
.  00000026:  02 43 00 76 49 1f 00 00                         PIP 0xfa4bb00 (NR=0)
.  0000002e:  00 00 00 00 00 00 00 00                         PAD
.  00000036:  02 c8 c2 3a 7c 00 00 00                         VMCS 0x7c3ac2

我只需要包含PERF_RECORD_AUXTRACE 的记录下的数据,就像这样。如果第一行包含

那就太好了

英特尔处理器跟踪数据:大小 2097824 字节

也可以从我的输出中避免。

.  00000000:  02 82 02 82 02 82 02 82 02 82 02 82 02 82 02 82 PSB
.  00000010:  00 00 00                                        PAD
.  00000013:  99 20                                           MODE.TSX TXAbort:0 InTX:0
.  00000015:  99 01                                           MODE.Exec 64
.  00000017:  7d 08 45 06 81 ff ff 00                         FUP 0xffff81064508
.  0000001f:  00 00 00 00 00 00 00                            PAD
.  00000026:  02 43 00 76 49 1f 00 00                         PIP 0xfa4bb00 (NR=0)
.  0000002e:  00 00 00 00 00 00 00 00                         PAD
.  00000000:  02 82 02 82 02 82 02 82 02 82 02 82 02 82 02 82 PSB
.  00000010:  00 00 00                                        PAD
.  00000013:  99 20                                           MODE.TSX TXAbort:0 InTX:0
.  00000015:  99 01                                           MODE.Exec 64
.  00000017:  7d 08 45 06 81 ff ff 00                         FUP 0xffff81064508
.  0000001f:  00 00 00 00 00 00 00                            PAD
.  00000026:  02 43 00 76 49 1f 00 00                         PIP 0xfa4bb00 (NR=0)
.  0000002e:  00 00 00 00 00 00 00 00                         PAD

编辑#3:这是我最初尝试做的......但显然不起作用!

cat "$file" | gawk -F' ' -- '
  /PERF_RECORD_AUXTRACE / {
    offset = strtonum($1)
    hsize  = strtonum(substr($2, 2))
    size   = strtonum($5)
    idx    = strtonum($11)
    ext    = ""


    ofile = sprintf("raw-pt.txt")
    begin = offset + hsize

    cmd = sprintf("dd if=%s of=%s conv=notrunc oflag=append ibs=1 " \
                  "count=%d status=none", file, ofile, size)

    #!cmd = sprintf("sed p")
    if (dry_run != 0) {
      print cmd
    }
    else {
     system(cmd)
    }
  }

我不太确定如何正确解析此文件以准确获得我想要的。我也不确定使用 Python 是否有帮助。

如何解决这个问题?

【问题讨论】:

  • 仍然模棱两可,“包含 PERF_RECORD_AUXTRACE 的记录后的所有行”...直到文件结尾或直到下一个不同部分(如 PERF_RECORD_MMAP)的开始?请在你的 Q 正文中澄清,而不是在 cmets 中。如果 Q 是明确的,我会删除它。祝你好运。
  • 嗨@shellter,我已经为我的问题添加了更多细节。我希望你能明白。
  • 我已投票结束此问题,因为它似乎是请求推荐工具或解决方案,而不是请求帮助您自己的代码。这使您的问题与 StackOverflow 无关。如果该评估不正确,并且您确实需要帮助编写自己的代码,那么请add your work so far to your question,我将撤回我的近距离投票。您已将您的问题标记为 bashawk。我希望在您的问题中看到 bash 和 awk 代码。
  • 嗨@ghoti,我在我的问题中添加了我的“不工作”awk 代码..

标签: bash shell file parsing awk


【解决方案1】:

要从您发布的输入中获得您说想要的输出,只需:

awk 'f; /PERF_RECORD_AUXTRACE/{f=1}' file

如果这实际上不是您想要的,那么编辑您的问题以阐明您的要求并提供不同的示例输入/输出,以便在必要时更真实地展示您的问题。

【讨论】:

  • 现在您的问题中的文字太多了,我个人无法通读所有内容以尝试了解您想要什么。我不明白为什么您不能简单地显示 1 个预期输入样本和给定输入的预期输出并简明扼要地说明您的要求。也许其他人会花时间通读一遍——祝你好运!
猜你喜欢
  • 2010-09-23
  • 2023-03-11
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2012-01-19
  • 1970-01-01
  • 2012-08-09
  • 2018-10-27
相关资源
最近更新 更多