【问题标题】:Bash: How to extract table-like structures from text fileBash:如何从文本文件中提取类似表格的结构
【发布时间】:2017-03-22 03:19:13
【问题描述】:

我有一个日志文件,其中包含一些数据和重要的类似表格的部分,如下所示:

    //Some data

    --------------------------------------------------------------------------------
    -----                 Output Table                             -----
    --------------------------------------------------------------------------------
            NAME                         Attr1    Attr2      Attr3    Attr4    Attr5
    --------------------------------------------------------------------------------
    fooooooooo                               0        0          3        0        0
    boooooooooooooooooooooo                  0        0         30        0        0
    abv                                      0        0         16        0        0
    bhbhbhbh                                 0        0          3        0        0
    foooo                                    0        0        198        0        0

    WARNING: Some message...


    WARNING: Some message...

    aaaaaaaaa                                0        0         60        0        7
    bbbbbbbb                                 0        0         48        0        7
    ccccccc                                  0        0         45        0        7
    rrrrrrr                                  0        0         50        0        7
    abcabca                                  0        0         42        0        6

// Some data...

    --------------------------------------------------------------------------------
    -----                 Another Output Table                                 -----
    --------------------------------------------------------------------------------
         NAME                            Attr1    Attr2      Attr3    Attr4    Attr5
    --------------------------------------------------------------------------------
    $$foo12                                  0        0          3        0        0
    $$foo12_720_720_14_2                     0        0         30        0        0

我想从给定文件中提取所有此类表并保存在单独的文件中。

注意事项:

  • 表格的开头表示包含 {NAME, Attr1, ..., Attr5} 字的行
  • 警告消息可能存在于表的范围内,应该被忽略
  • 当出现空行并且该空行的下一个不是“WARNING”行时,表格结束。

所以我希望输出以下 2 个文件:

        NAME                         Attr1    Attr2      Attr3    Attr4    Attr5
--------------------------------------------------------------------------------
fooooooooo                               0        0          3        0        0
boooooooooooooooooooooo                  0        0         30        0        0
abv                                      0        0         16        0        0
bhbhbhbh                                 0        0          3        0        0
foooo                                    0        0        198        0        0
aaaaaaaaa                                0        0         60        0        7
bbbbbbbb                                 0        0         48        0        7
ccccccc                                  0        0         45        0        7
rrrrrrr                                  0        0         50        0        7
abcabca                                  0        0         42        0        6

     NAME                            Attr1    Attr2      Attr3    Attr4    Attr5
--------------------------------------------------------------------------------
$$foo12                                  0        0          3        0        0
$$foo12_720_720_14_2                     0        0         30        0        0

【问题讨论】:

  • 我已投票决定关闭此问题,因为它似乎是请求推荐工具或解决方案,而不是请求帮助您自己的代码。这使您的问题与 StackOverflow 无关。如果该评估不正确,并且您确实需要帮助编写自己的代码,那么请add your work so far to your question,我将撤回我的近距离投票。

标签: bash awk sed


【解决方案1】:

我会按照您的指示编写以下 awk 脚本。

#! /usr/bin/awk -f

# start a table with a NAME line
/^ +NAME/ {
    titles = $0
    print
    next
}

# don't print if not in table
! titles {
    next
}

# blank line may mean end-of-table
/^$/ {
    EOT = 1
    next
}

# warning is not EOT
/^WARNING/ {
    EOT = 0
    next
}

# end of table means we're not in a table anymore, Toto
EOT {
    titles = 0
    EOT = 0
    next
}

# print what's in the table
{ print }

【讨论】:

    【解决方案2】:

    试试这个 -

    awk -F'[[:space:]]+' 'NF>6 || ($0 ~ /-/ && $0 !~ "Output") {print $0}' f
        --------------------------------------------------------------------------------
        --------------------------------------------------------------------------------
                NAME                         Attr1    Attr2      Attr3    Attr4    Attr5
        --------------------------------------------------------------------------------
        fooooooooo                               0        0          3        0        0
        boooooooooooooooooooooo                  0        0         30        0        0
        abv                                      0        0         16        0        0
        bhbhbhbh                                 0        0          3        0        0
        foooo                                    0        0        198        0        0
        aaaaaaaaa                                0        0         60        0        7
        bbbbbbbb                                 0        0         48        0        7
        ccccccc                                  0        0         45        0        7
        rrrrrrr                                  0        0         50        0        7
        abcabca                                  0        0         42        0        6
        --------------------------------------------------------------------------------
        --------------------------------------------------------------------------------
             NAME                            Attr1    Attr2      Attr3    Attr4    Attr5
        --------------------------------------------------------------------------------
        $$foo12                                  0        0          3        0        0
        $$foo12_720_720_14_2                     0        0         30        0        0
    

    【讨论】:

      猜你喜欢
      • 2014-04-26
      • 2021-11-29
      • 1970-01-01
      • 2018-08-13
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-12-28
      相关资源
      最近更新 更多