【发布时间】:2019-02-19 16:02:41
【问题描述】:
问题(解决方案如下)
假设以下脚本对多个文件进行操作并在找到模式“TODO:”时打印出a whole surrounding paragraph:
awk -v RS='' '{
if(/TODO:/) {
print
print "\n"
}
}' *.txt
是否可以打印出这些段落,从而跳过这些段落中包含模式DONE: 的行?
如果提供以下数据:
Apples
Oranges
Bananas
TODO: A
TODO: B
Lorem ipsum
Ad usu oporteat
TODO: C
DONE: D
TODO: E
Ipsum lorem
那么输出不应包含条目DONE: D,不应包含带有水果的段落(因为那里没有TODO: 项目),并包含其他所有内容:
TODO: A
TODO: B
Lorem ipsum
Ad usu oporteat
TODO: C
TODO: E
Ipsum lorem
(当然,我可以通过管道传递| grep -v 'DONE:',但想在这里了解一些关于 awk 的知识...)
解决方案和结果:
首先,@EdMorton,对提供的功能进行了简单明了的改进:
awk -v RS='' -v ORS='' 'FNR==1{td_file=0} {
if(/TODO:/) {
if (!td_file) {
print "\n\n"
f=FILENAME; sub(".txt", "", f)
print f "\n"
td_file=1
}
sub(/\n.*DONE:.[^\n]*\n/,"\n")
print
}
}' *.txt
time举报:
real 0m0.048s
user 0m0.029s
sys 0m0.018s
第二个,作者 @RavinderSingh13,据我了解并经过一些清理:
awk '
# Check, if this a new file being proceeded
# If so, reset td_file marker to False
FNR==1{td_file=0}{
# Check if this file contains 'TODO:' pattern and if it hasn't been proceeded yet
if(/TODO:/ && !td_file) {
# If so, print out FILENAME
print "\n" FILENAME
# Set td_file marker to True
# (to mark the file as proceeded, in order not to print out FILENAME twice)
td_file=1
}
}
# Check, if this is a new file OR the current line has data (number of fields is not 0)
FNR==1 || !NF{
# If so, and if td_entr marker is True, and if we have something to print (container cont is not empty)
if (td_entr && cont) {
# Then, print it out
print cont
}
# And reset variables
cont=td_entr=""
}
# Check if the current line starts with 'TODO:'
/TODO:/ {
# If so, set todo marker to 1
td_entr=1
}
# Also, check if the current line does not contain 'DONE:'
!/DONE:/ {
# If so, check variable cont:
# If it doesn't exist, create it and assign to the current line being proceeded
# If it exists, add the Output Records Separator, ORS, and then append the current line being proceeded
cont=cont?cont ORS $0:$0
}
' *.txt
根据我的测试,time 报告此版本需要更多资源(如果我正确理解算法,这并不奇怪):
real 0m0.090s
user 0m0.065s
sys 0m0.022s
鉴于这种比较(并且由于第一个解决方案完全基于我在问题中提供的小脚本),我将@EdMorton 回复设置为答案。尽管如此,我还是非常感谢两位参与者,谢谢(我今天确实学到了一些东西:)!
【问题讨论】:
标签: awk