【问题标题】:Awk: skip a line from a paragraphawk:从段落中跳过一行
【发布时间】:2019-02-19 16:02:41
【问题描述】:

问题(解决方案如下)

假设以下脚本对多个文件进行操作并在找到模式“TODO:”时打印出a whole surrounding paragraph

awk -v RS='' '{
    if(/TODO:/) {
        print
        print "\n"
    }
}' *.txt

是否可以打印出这些段落,从而跳过这些段落中包含模式DONE: 的行?

如果提供以下数据:

Apples
Oranges
Bananas

TODO: A
TODO: B
Lorem ipsum

Ad usu oporteat
TODO: C
DONE: D
TODO: E
Ipsum lorem

那么输出不应包含条目DONE: D,不应包含带有水果的段落(因为那里没有TODO: 项目),并包含其他所有内容:

TODO: A
TODO: B
Lorem ipsum

Ad usu oporteat
TODO: C
TODO: E
Ipsum lorem

(当然,我可以通过管道传递| grep -v 'DONE:',但想在这里了解一些关于 awk 的知识...)

解决方案和结果:

首先,@EdMorton,对提供的功能进行了简单明了的改进:

awk -v RS='' -v ORS='' 'FNR==1{td_file=0} {
    if(/TODO:/) {
        if (!td_file) {
            print "\n\n"
            f=FILENAME; sub(".txt", "", f)
            print f "\n"
            td_file=1
        }
        sub(/\n.*DONE:.[^\n]*\n/,"\n")
        print
    }
}' *.txt

time举报:

real    0m0.048s
user    0m0.029s
sys     0m0.018s

第二个,作者 @RavinderSingh13,据我了解并经过一些清理:

awk '
# Check, if this a new file being proceeded
# If so, reset td_file marker to False
FNR==1{td_file=0}{
# Check if this file contains 'TODO:' pattern and if it hasn't been proceeded yet
    if(/TODO:/ && !td_file) {
# If so, print out FILENAME
        print "\n" FILENAME
# Set td_file marker to True
# (to mark the file as proceeded, in order not to print out FILENAME twice)
        td_file=1
    }
}
# Check, if this is a new file OR the current line has data (number of fields is not 0)
FNR==1 || !NF{
# If so, and if td_entr marker is True, and if we have something to print (container cont is not empty)
    if (td_entr && cont) {
# Then, print it out
        print cont
    }
# And reset variables
    cont=td_entr=""
}
# Check if the current line starts with 'TODO:'
/TODO:/ {
# If so, set todo marker to 1
    td_entr=1
}
# Also, check if the current line does not contain 'DONE:'
!/DONE:/ {
# If so, check variable cont:
# If it doesn't exist, create it and assign to the current line being proceeded
# If it exists, add the Output Records Separator, ORS, and then append the current line being proceeded
    cont=cont?cont ORS $0:$0
    }
' *.txt

根据我的测试,time 报告此版本需要更多资源(如果我正确理解算法,这并不奇怪):

real    0m0.090s
user    0m0.065s
sys     0m0.022s

鉴于这种比较(并且由于第一个解决方案完全基于我在问题中提供的小脚本),我将@EdMorton 回复设置为答案。尽管如此,我还是非常感谢两位参与者,谢谢(我今天确实学到了一些东西:)!

【问题讨论】:

    标签: awk


    【解决方案1】:

    编辑:由于 OP 在他/她的帖子中添加了更多详细信息,因此现在添加以下解决方案。

    awk 'prev!=FILENAME{if(found && val){print val};val=found="";prev=FILENAME}!NF{if(val && found){print val};val=found=""} /^TODO/{found=1} !/DONE:/{val=val?val ORS $0:$0} END{if(val && found){print val}}'  *.txt
    

    说明:在此处添加上述代码的完整说明。

    awk '
    prev!=FILENAME{               ##Checking if variable prev value is NOT equal to FILENAME(which is awk out of the box variable which concatins name of Input_file(s)).
      if(found && val){           ##If new Input_file is being read and variable found and val are NOT NULL then do following.
        print val                 ##Printing variable val here.
      }
      val=found=""                ##Nullifying variables val and found here.
      prev=FILENAME               ##Setting variable prev value to FILENAME(current Input_files name).
    }
    !NF{                          ##Checking condition if a line DO NOT have any fields or have spaces only then do following.
      if(val && found){           ##Checkig condition if variable val and found are NOT NULL here then do following.
        print val                 ##Printing variable val here.
      }
      val=found=""                ##Nullifying variables val and found here.
    }
    /^TODO/{                      ##Checking condition if a line starts with TODO then do following.
      found=1                     ##Setting found value as 1 here.
    }
    !/DONE:/{                     ##Checking if a line does not contains string DONE: then do following.
      val=(val?val ORS $0:$0)     ##Creatig variable val whose value will be keep concatenating its own value.
    }
    END{                          ##Mentioning END section of this awk program here.
      if(val && found){           ##Checking if variable val and found are NOT NULL then do following.
        print val                 ##Printing variable val here.
      }
    }' *.txt                      ##Mentioning all *.txt here.
    

    我在上面假设您只想从TODO 开始打印到直到Ipsum 字符串,如果一行包含DONE: D 它也会跳过它。



    一个简单的awk 将是。

    awk '!/DONE: D/' Input_file
    

    解释:这里我们检查条件,如果一行不包含字符串DONE: D,然后打印这些行。现在问题来了,我们在这里没有提到条件为 TRUE 时的任何动作,所以对此的解释是:awk 作用于条件方法,然后是动作,因为默认情况下不会发生当前行的打印定义的动作。

    【讨论】:

      【解决方案2】:
      $ awk -v RS= -v ORS='\n\n' '/TODO:/{sub(/\nDONE: D\n/,"\n"); print}' file
      TODO: A
      TODO: B
      Lorem ipsum
      
      Ad usu oporteat
      TODO: C
      TODO: E
      Ipsum lorem
      

      【讨论】:

      • 换人,太聪明了!还有一些我绝对理解的东西,谢谢:)(只有这种模式更适合:sub(/\n.*DONE:.[^\n]*\n/,"\n")
      • 我在 OP 中添加了一些分析,并在此基础上将您的回复设置为答案。再次感谢您。
      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2012-07-26
      • 1970-01-01
      • 1970-01-01
      • 2018-03-31
      • 2012-07-10
      • 2012-07-08
      • 1970-01-01
      相关资源
      最近更新 更多