【问题标题】:Parsing Challenge - Fixing Broken Syntax解析挑战 - 修复损坏的语法
【发布时间】:2018-12-20 22:43:05
【问题描述】:

我有数千行使用特定非标准语法的代码。我需要能够使用不支持此语法的不同编译器编译代码。我试图自动化需要进行的更改,但对正则表达式等不是很好。我失败了。

这是我想要实现的目标:目前在我的代码中,使用以下可能的语法调用/访问对象的方法和变量:

call obj.method()
obj.method( )
obj.method( arg1, arg2, kwarg1=kwarg1 )
obj1.var = obj2.var2

相反,我希望它是:

call obj%method()
obj%method( )
obj%method( arg1, arg2, kwarg1=kwarg1 )
obj1%var = obj2%var2

并且我想在不影响以下可能出现的“.”的情况下进行这些更改:

十进制数:

a = 1.0
b = 1.d0

逻辑运算符(注意可能的空格和方法调用):

if (a.or.b) then
    if ( a .and. .not.(obj.l1(1.d0)) ) then

任何被注释的内容(感叹号“!”用于此目的)

!>I am a commented line.
   ! > I am.a commented line with..leading blanks and extra periods.1.
b=a1.var( 0.d0 ) !! I contain a commented version of this line: b=a1.var( 0.d0 )  

引号中的任何内容(即字符串文字)

c = "I am a string"
c= 'I am an obnoxious string: b=a1.var( 0.d0 ) ... '

有谁知道如何解决这个问题。我想正则表达式是自然的方法,但我对任何事情都持开放态度。 (如果有人关心:代码是用 fortran 编写的。ifort 对“.”语法很满意;gfortran 不是)

【问题讨论】:

  • 将所有输入示例放入 1 个文件中,这样我们只需使用 1 个输入文件进行测试并为该文件提供预期的输出。添加一个必须处理的更复杂的示例,例如c="I am a string!"; b=a1.var() 会很棘手,因为在这种情况下,! 不是评论的开始。
  • 隔离方法调用非常简单。困难的部分是匹配obj1.var = obj2.var2 之类的东西而不匹配b = 1.d0。我不确定您是否能够编写足够紧凑的模式来更改您想要的内容,而不会更改更多而不是您想要的。
  • 也许你可以尝试分两步完成

标签: regex parsing sed


【解决方案1】:

您是否考虑过使用flex 解决问题?它使用正则表达式,但更高级,因为它尝试不同的模式并返回最长的匹配选项。规则可能如下所示:

%%                                           /* rule part of the program */
!.*\n                     printf(yytext);    /* ignore comments */
\".*\"|'.*'               printf(yytext);    /* ignore strings */
[^A-Za-z_][0-9]+\.        printf(yytext);    /* ignore numbers */
".and."|".or."|".not."    printf(yytext);    /* ignore logical operators */
\.                        printf("%%");      /* now, replace the . by % */
[^\.]                     printf(yytext);    /* ignore everything else */

%%                                           /* invoke the program */
int main() {
    yylex();
}

您可能需要修改第三行。目前,如果没有从AZ、从az 或在数字前的字符_,它会忽略出现在任意位数之后的任何.。如果标识符中有更多的合法字符,您可以添加它们。

如果一切正确,您应该可以将其转换为程序。将其复制到名为lex.l 的文件中并执行:

$ flex -o lex.yy.c lex.l
$ gcc -o lex.out lex.yy.c -lfl

那么你就有了 C 程序 lex.out。您可以在命令行中使用它:

cat unreplaced.txt | ./lex.out > replaced.txt

这使用与 Ed Mortons 建议相同的原则,但它使用弹性,因此我们可以跳过组织。在某些情况下它仍然会失败,例如在字符串中包含 \"

示例输入

call obj.method()
obj.method( )
obj.method( arg1, arg2, kwarg1=kwarg1 )
obj1.var = obj2.var2
a = 1.0
b = 1.d0
if (a.or.b) then
    if ( a .and. .not.(obj.l1(1.d0)) ) then
!>I am a commented line.
   ! > I am.a commented line with..leading blanks and extra periods.1.
b=a1.var( 0.d0 ) !! I contain a commented version of this line: b=a1.var( 0.d0 )
c = "I am a string"
c= 'I am an obnoxious string: b=a1.var( 0.d0 ) ... '
c="I am an exclaimed string!"; b=a1.var()

输出

call obj%method()
obj%method( )
obj%method( arg1, arg2, kwarg1=kwarg1 )
obj1%var = obj2%var2
a = 1.0
b = 1.d0
if (a.or.b) then
    if ( a .and. .not.(obj%l1(1.d0)) ) then
!>I am a commented line.
   ! > I am.a commented line with..leading blanks and extra periods.1.
b=a1%var( 0.d0 ) !! I contain a commented version of this line: b=a1.var( 0.d0 )
c = "I am a string"
c= 'I am an obnoxious string: b=a1.var( 0.d0 ) ... '
c="I am an exclaimed string!"; b=a1%var()

【讨论】:

  • @EdMorton 感谢您的想法。我刚刚添加了它,它确实给出了相同的输出。
【解决方案2】:

如果没有语言解析器,您将无法 100% 稳健地执行此操作(例如,如果您在双引号字符串中包含 \",则在某些情况下以下操作会失败 - 易于处理,但只是您的使用未涵盖的许多可能失败之一案例),但这将处理您迄今为止向我们展示的内容以及更多内容。它使用 GNU awk 进行 gensub() 和第三个参数来匹配()。

示例输入:

$ cat file
call obj.method()
obj.method( )
obj.method( arg1, arg2, kwarg1=kwarg1 )
obj1.var = obj2.var2
a = 1.0
b = 1.d0
if (a.or.b) then
    if ( a .and. .not.(obj.l1(1.d0)) ) then
!>I am a commented line.
   ! > I am.a commented line with..leading blanks and extra periods.1.
b=a1.var( 0.d0 ) !! I contain a commented version of this line: b=a1.var( 0.d0 )
c = "I am a string"
c= 'I am an obnoxious string: b=a1.var( 0.d0 ) ... '
c="I am an exclaimed string!"; b=a1.var()

预期输出:

$ cat out
call obj%method()
obj%method( )
obj%method( arg1, arg2, kwarg1=kwarg1 )
obj1%var = obj2%var2
a = 1.0
b = 1.d0
if (a.or.b) then
    if ( a .and. .not.(obj%l1(1.d0)) ) then
!>I am a commented line.
   ! > I am.a commented line with..leading blanks and extra periods.1.
b=a1%var( 0.d0 ) !! I contain a commented version of this line: b=a1.var( 0.d0 )
c = "I am a string"
c= 'I am an obnoxious string: b=a1.var( 0.d0 ) ... '
c="I am an exclaimed string!"; b=a1%var()

脚本:

$ cat tst.awk
{
    # give us the ability to use @<any other char> strings as a
    # replacement/placeholder strings that cannot exist in the input.
    gsub(/@/,"@=")

    # ignore all !s inside double-quoted strings
    while ( match($0,/("[^"]*)!([^"]*")/,a) ) {
        $0 = substr($0,1,RSTART-1) a[1] "@-" a[2] substr($0,RSTART+RLENGTH)
    }

    # ignore all !s inside single-quoted strings
    while ( match($0,/('[^']*)!([^']*')/,a) ) {
        $0 = substr($0,1,RSTART-1) a[1] "@-" a[2] substr($0,RSTART+RLENGTH)
    }

    # Now we can separate comments from what comes before them
    comment = gensub(/[^!]*/,"",1)
    $0      = gensub(/!.*/,"",1)

    # ignore all .s inside double-quoted strings
    while ( match($0,/("[^"]*)\.([^"]*")/,a) ) {
        $0 = substr($0,1,RSTART-1) a[1] "@#" a[2] substr($0,RSTART+RLENGTH)
    }

    # ignore all .s inside single-quoted strings
    while ( match($0,/('[^']*)\.([^']*')/,a) ) {
        $0 = substr($0,1,RSTART-1) a[1] "@#" a[2] substr($0,RSTART+RLENGTH)
    }

    # convert all logical operators like a.or.b to a@#or@#b so the .s wont get replaced later
    while ( match($0,/\.([[:alpha:]]+)\./,a) ) {
        $0 = substr($0,1,RSTART-1) "@#" a[1] "@#" substr($0,RSTART+RLENGTH)
    }

    # convert all obj.var and similar to obj%var, etc.
    while ( match($0,/\<([[:alpha:]]+[[:alnum:]_]*)[.]([[:alpha:]]+[[:alnum:]_]*)\>/,a) ) {
        $0 = substr($0,1,RSTART-1) a[1] "%" a[2] substr($0,RSTART+RLENGTH)
    }

    # Convert all @#s in the precomment text back to .s
    gsub(/@#/,".")

    # Add the comment back
    $0 = $0 comment

    # Convert all @-s back to !s
    gsub(/@-/,"!")

    # Convert all @=s back to @s
    gsub(/@=/,"@")

    print
}

运行脚本及其输出:

$ awk -f tst.awk file
call obj%method()
obj%method( )
obj%method( arg1, arg2, kwarg1=kwarg1 )
obj1%var = obj2%var2
a = 1.0
b = 1.d0
if (a.or.b) then
    if ( a .and. .not.(obj%l1(1.d0)) ) then
!>I am a commented line.
   ! > I am.a commented line with..leading blanks and extra periods.1.
b=a1%var( 0.d0 ) !! I contain a commented version of this line: b=a1.var( 0.d0 )
c = "I am a string"
c= 'I am an obnoxious string: b=a1.var( 0.d0 ) ... '
c="I am an exclaimed string!"; b=a1%var()

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2018-01-03
    • 2019-03-09
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2011-06-08
    • 2010-09-25
    • 1970-01-01
    相关资源
    最近更新 更多