【问题标题】:Eliminate lines not matching multiline pattern消除与多行模式不匹配的行
【发布时间】:2013-04-02 20:01:07
【问题描述】:

我正在搜索日志文件,试图确定用户登录的总时间。我已经删除了所有与登录和注销无关的行。但是,由于某种原因,我们的登录行没有相应的注销行,所以我想消除它们。例如:

2013-04-07 08:44:01 [INFO] User logged in
2013-04-07 08:54:55 [INFO] User logged in
2013-04-07 08:57:12 [INFO] User logged in
2013-04-07 08:59:45 [INFO] User logged in
2013-04-07 09:01:28 [INFO] User logged in
2013-04-07 09:11:00 [INFO] User logged in
2013-04-07 09:12:56 [INFO] User logged in
2013-04-07 09:15:43 [INFO] User lost connection

我只想要

2013-04-07 09:12:56 [INFO] User logged in
2013-04-07 09:15:43 [INFO] User lost connection

【问题讨论】:

  • 是否有可能连续有多个User lost connection 行?

标签: regex bash pattern-matching


【解决方案1】:

这个 awk 单线可以解决问题:(至少对于你的例子。我看不到的真实文件)

awk -F\[ '{a[$2]=$0;}END{for(x in a)print a[x]}' file

用你的数据测试:

kent$  echo "2013-04-07 08:44:01 [INFO] User logged in
2013-04-07 08:54:55 [INFO] User logged in
2013-04-07 08:57:12 [INFO] User logged in
2013-04-07 08:59:45 [INFO] User logged in
2013-04-07 09:01:28 [INFO] User logged in
2013-04-07 09:11:00 [INFO] User logged in
2013-04-07 09:12:56 [INFO] User logged in
2013-04-07 09:15:43 [INFO] User lost connection"|awk -F\[ '{a[$2]=$0;}END{for(x in a)print a[x]}'                                                                           
2013-04-07 09:12:56 [INFO] User logged in
2013-04-07 09:15:43 [INFO] User lost connection

对于相同的登录,只会打印最后一个。

编辑

我认为你的真实文件可能是这种情况:

您可能有多个登录丢失的连接块,例如:

kent$  cat file
2013-04-07 09:11:00 [INFO] User logged in
2013-04-07 09:12:56 [INFO] User logged in
2013-04-07 09:15:43 [INFO] User lost connection
2013-04-08 09:11:00 [INFO] User logged in
2013-04-08 09:12:56 [INFO] User logged in
2013-04-08 09:15:43 [INFO] User lost connection

那么这条线适合你:

 awk '/lost/{print a;print;next;}{a=$0}' file

输出是:

2013-04-07 09:12:56 [INFO] User logged in
2013-04-07 09:15:43 [INFO] User lost connection
2013-04-08 09:12:56 [INFO] User logged in
2013-04-08 09:15:43 [INFO] User lost connection

【讨论】:

    【解决方案2】:

    假设永远不会有多个User lost connection 行连续,以下应该可以工作:

    sed '/User logged in/{h;d};H;x' file
    

    或者,如果您使用的系统不支持将; 作为命令分隔符:

    sed -e '/User logged in/{h
    d
    }' -e 'H' -e 'x' file
    

    【讨论】:

    • 谢谢!人们提出的所有答案都奏效了——我之所以选择这个答案是因为它非常紧凑。
    【解决方案3】:

    我可以展示一个 awk 解决方案。如果一行包含“登录”字符串,则保存该行。如果该行不包含“登录”字符串,则打印最后存储的行并打印当前行。如果可能有两条“丢失的连接”线相互连接,则可能会出现问题。 awk 也是过滤掉其他行的好选择。

    #!/bin/bash
    
    awk '!/logged in/ {print x"\n"$0} {x = $0}' <<EOT
    2013-04-07 08:44:01 [INFO] User logged in
    2013-04-07 08:54:55 [INFO] User logged in
    2013-04-07 08:57:12 [INFO] User logged in
    2013-04-07 08:59:45 [INFO] User logged in
    2013-04-07 09:01:28 [INFO] User logged in
    2013-04-07 09:11:00 [INFO] User logged in
    2013-04-07 09:12:56 [INFO] User logged in
    2013-04-07 09:15:43 [INFO] User lost connection
    EOT
    

    【讨论】:

      【解决方案4】:

      这可能对你有用(GNU sed):

      sed -r '$!N;/(User logged in)\n.*\1/D' file
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2018-07-11
        • 2015-11-06
        • 1970-01-01
        • 2014-07-27
        • 2012-01-27
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多