如果没有语言解析器,您将无法 100% 稳健地执行此操作(例如,如果您在双引号字符串中包含 \",则在某些情况下以下操作会失败 - 易于处理,但只是您的使用未涵盖的许多可能失败之一案例),但这将处理您迄今为止向我们展示的内容以及更多内容。它使用 GNU awk 进行 gensub() 和第三个参数来匹配()。
示例输入:
$ cat file
call obj.method()
obj.method( )
obj.method( arg1, arg2, kwarg1=kwarg1 )
obj1.var = obj2.var2
a = 1.0
b = 1.d0
if (a.or.b) then
if ( a .and. .not.(obj.l1(1.d0)) ) then
!>I am a commented line.
! > I am.a commented line with..leading blanks and extra periods.1.
b=a1.var( 0.d0 ) !! I contain a commented version of this line: b=a1.var( 0.d0 )
c = "I am a string"
c= 'I am an obnoxious string: b=a1.var( 0.d0 ) ... '
c="I am an exclaimed string!"; b=a1.var()
预期输出:
$ cat out
call obj%method()
obj%method( )
obj%method( arg1, arg2, kwarg1=kwarg1 )
obj1%var = obj2%var2
a = 1.0
b = 1.d0
if (a.or.b) then
if ( a .and. .not.(obj%l1(1.d0)) ) then
!>I am a commented line.
! > I am.a commented line with..leading blanks and extra periods.1.
b=a1%var( 0.d0 ) !! I contain a commented version of this line: b=a1.var( 0.d0 )
c = "I am a string"
c= 'I am an obnoxious string: b=a1.var( 0.d0 ) ... '
c="I am an exclaimed string!"; b=a1%var()
脚本:
$ cat tst.awk
{
# give us the ability to use @<any other char> strings as a
# replacement/placeholder strings that cannot exist in the input.
gsub(/@/,"@=")
# ignore all !s inside double-quoted strings
while ( match($0,/("[^"]*)!([^"]*")/,a) ) {
$0 = substr($0,1,RSTART-1) a[1] "@-" a[2] substr($0,RSTART+RLENGTH)
}
# ignore all !s inside single-quoted strings
while ( match($0,/('[^']*)!([^']*')/,a) ) {
$0 = substr($0,1,RSTART-1) a[1] "@-" a[2] substr($0,RSTART+RLENGTH)
}
# Now we can separate comments from what comes before them
comment = gensub(/[^!]*/,"",1)
$0 = gensub(/!.*/,"",1)
# ignore all .s inside double-quoted strings
while ( match($0,/("[^"]*)\.([^"]*")/,a) ) {
$0 = substr($0,1,RSTART-1) a[1] "@#" a[2] substr($0,RSTART+RLENGTH)
}
# ignore all .s inside single-quoted strings
while ( match($0,/('[^']*)\.([^']*')/,a) ) {
$0 = substr($0,1,RSTART-1) a[1] "@#" a[2] substr($0,RSTART+RLENGTH)
}
# convert all logical operators like a.or.b to a@#or@#b so the .s wont get replaced later
while ( match($0,/\.([[:alpha:]]+)\./,a) ) {
$0 = substr($0,1,RSTART-1) "@#" a[1] "@#" substr($0,RSTART+RLENGTH)
}
# convert all obj.var and similar to obj%var, etc.
while ( match($0,/\<([[:alpha:]]+[[:alnum:]_]*)[.]([[:alpha:]]+[[:alnum:]_]*)\>/,a) ) {
$0 = substr($0,1,RSTART-1) a[1] "%" a[2] substr($0,RSTART+RLENGTH)
}
# Convert all @#s in the precomment text back to .s
gsub(/@#/,".")
# Add the comment back
$0 = $0 comment
# Convert all @-s back to !s
gsub(/@-/,"!")
# Convert all @=s back to @s
gsub(/@=/,"@")
print
}
运行脚本及其输出:
$ awk -f tst.awk file
call obj%method()
obj%method( )
obj%method( arg1, arg2, kwarg1=kwarg1 )
obj1%var = obj2%var2
a = 1.0
b = 1.d0
if (a.or.b) then
if ( a .and. .not.(obj%l1(1.d0)) ) then
!>I am a commented line.
! > I am.a commented line with..leading blanks and extra periods.1.
b=a1%var( 0.d0 ) !! I contain a commented version of this line: b=a1.var( 0.d0 )
c = "I am a string"
c= 'I am an obnoxious string: b=a1.var( 0.d0 ) ... '
c="I am an exclaimed string!"; b=a1%var()