提取 Fortran 字符串数组的子字符串答案

【问题标题】：Extract substring of Fortran string array提取 Fortran 字符串数组的子字符串
【发布时间】：2018-10-08 17:40:29
【问题描述】：

如何提取 Fortran 字符串数组的子字符串？例如

program testcharindex
    implicit none
    character(len=10), dimension(5) :: s
    character(len=10), allocatable :: comp(:)
    integer, allocatable  :: i(:), n(:)
    s = (/ '1_E ', '2_S ', '3_E ', '14_E', '25_S' /)
    i = index(s,'_')
    print *,' i = ', i
    n = s(1:i-1) ! or n = s(:i-1)
    comp = s(i+1:)
    print *,' n = ', n
    print *,' comp = ', comp
end program

使用 gfortran 编译会产生错误：

testcharindex.f90:11:10:

n = s(1:i-1) 1 错误：(1) 处的数组索引必须是标量

有什么方法可以避免这里的 do 循环吗？如果可以提取字符串数组的索引，我希望应该能够提取字符串数组的动态定义子字符串（无需遍历数组元素）。我是不是太乐观了？

【问题讨论】：

标签： arrays fortran substring

【解决方案1】：

如果要避免循环并且没有其他（简单）方法，则定义元素子字符串函数并将其应用于字符串数组可能很有用。例如，

module str_mod
    implicit none
contains
    elemental function substr( s, a, b ) result( res )
        character(*), intent(in) :: s
        integer,      intent(in) :: a, b
        character(len(s)) :: res

        res = s( a : b )
    endfunction
endmodule

program main
    use str_mod
    implicit none
    character(10) :: s( 5 )
    integer, allocatable :: ind(:)
    character(len(s)), allocatable :: comp(:)

    s = [ '1_E ', '2_S ', '3_E ', '14_E', '25_S' ]
    ! s = [ character(len(s)) :: '1_E', '2_S', '3_E', '14_E', '25_S' ]

    print *, "test(scalar) : ", substr( s(1), 1, 2 )
    print *, "test(array ) : ", substr( s,    1, 2 )

    ind = index( s, '_' )
    comp = substr( s, 1, ind-1 )

    print *
    print *, "string (all)    : ", s
    print *, "string before _ : ", comp
    print *, "string after _  : ", substr( s, ind+1, len(s) )
endprogram

给出（使用 gfortran-7.3）

 test(scalar) : 1_        
 test(array ) : 1_        2_        3_        14        25        

 string (all)    : 1_E       2_S       3_E       14_E      25_S      
 string before _ : 1         2         3         14        25        
 string after _  : E         S         E         E         S

【讨论】：

这正是因为函数substr返回一个长度一致且独立于所考虑的单个元素的字符。
是的，我认为字符串数组中所有元素的字符串长度相同的约束在这里有所帮助。（顺便说一句，我的另一个担心是，这对于不太新的编译器可能会失败......所以我想使用简单的循环更“健壮”:-)

【解决方案2】：

这里有几个问题。其中一个很容易解决（并且一直存在于其他问题中：您可以找到这些以获得更多详细信息）。

一行¹

n = s(1:i-1)

编译器抱怨的是试图引用数组s 的一部分，而不是数组s 的元素的子字符串数组。要访问数组的子字符串，您需要

n = s(:)(1:i-1)

但是，这与您的第二个问题有关。由于编译器抱怨访问数组部分，i 必须是一个标量。对于访问数组的子字符串也是如此。上面的行仍然不起作用。

本质上，如果您希望访问数组的子字符串，每个子字符串必须具有完全相同的结构。也就是说，在s(:)(i:j) 中，i 和j 都必须是标量整数表达式。这是因为希望返回数组的每个元素都具有相同的长度。

然后，您将需要使用循环。

¹ 正如高性能标记曾经评论的那样，作业本身也存在问题。我只考虑了右侧的表达式。即使纠正了有效的数组子字符串，表达式仍然是字符数组，不能根据需要分配给整数标量n。

如果您想要关于选择子字符串的字面答案，请按上述方式阅读。如果您只关心“将字符数组的一部分转换为整数数组”，那么another answer 可以很好地解决问题。

【讨论】：

【解决方案3】：

@francescalus 已经解释了错误，这是我对 OP 似乎真正要解决的问题的贡献，即如何从字符串数组中读取整数，例如

s = (/ '1_E ', '2_S ', '3_E ', '14_E', '25_S' /)

OP 希望在没有循环的情况下做到这一点，@roygvib 指出我们要使用基本函数。这是我从字符串中读取整数的函数的版本。这会忽略任何前导空格，因此应该处理诸如12_e 之类的字符串。然后它在第一个非数字字符处停止扫描（因此从12_3 等字符串中读取12）。

ELEMENTAL INTEGER FUNCTION read_int(str)
  CHARACTER(*), INTENT(in) :: str
  CHARACTER(:), ALLOCATABLE :: instr

  instr = adjustl(str)
  instr = instr(1:VERIFY(instr,'0123456789')-1)
  ! if the string doesn't have a leading digit instr will be empty, return a guard value
  IF(instr=='') instr = '-999'
  READ(instr,*) read_int
END FUNCTION read_int

我相信这已经足够清楚了。然后OP可以写

n = read_int(s)

【讨论】：