【问题标题】:Python loop through string in nested for loopsPython在嵌套for循环中遍历字符串
【发布时间】:2014-01-29 23:10:44
【问题描述】:

我只是想知道,我正在尝试进行非常简单的文本处理或还原。我想用一个替换所有空格(" " 中没有这些空格)。我也有一些依赖于读取的每个字符的语义操作,所以这就是我不想使用任何正则表达式的原因。这是某种伪 FSM 模型。

所以这是交易:

s = '''that's my     string, "   keep these spaces     "    but reduce these '''

期望的输出:

that's my string, "   keep these spaces    " but reduce these

我想做的是这样的:(我没有提到'"'的情况,以保持示例简单)

out = ""
for i in range(len(s)):

  if s[i].isspace():
    out += ' '
    while s[i].isspace():
      i += 1

  else:
    out += s[i]

我不太明白在这种情况下如何创建或共享范围。

谢谢你的建议。

【问题讨论】:

  • 什么是变量line和lineCpy?
  • 问题是,一旦你跳过了while循环中的所有括号,i变量将在最后一个满足s[i].isspace()条件的“空格”之后取下一个值...所以你不会删除所有的括号,你只会再次遍历它们......
  • 啊,抱歉,我错过了它们,它们都是s 字符串,我猜我是盲人。

标签: python string loops python-3.x nested-loops


【解决方案1】:

有点担心这个解决方案是否可读。修改了字符串 OP 建议在给定的字符串中包含多个双引号对。

s = '''that's my     string,   "   keep these spaces     "" as    well    as these    "    reduce these"   keep these spaces too   "   but not these  '''
s_split = s.split('"')

# The substrings in odd positions of list s_split should retain their spaces.
# These elements have however lost their double quotes during .split('"'),
# so add them for new string. For the substrings in even postions, remove 
# the multiple spaces in between by splitting them again using .split() 
# and joining them with a single space. However this will not conserve 
# leading and trailing spaces. In order conserve them, add a dummy 
# character (in this case '-') at the start and end of the substring before 
# the split. Remove the dummy bits after the split.
#
# Finally join the elements in new_string_list to create the desired string.

new_string_list = ['"' + x + '"' if i%2 == 1
                   else ' '.join(('-' + x + '-').split())[1:-1]                   
                   for i,x in enumerate(s_split)]
new_string = ''.join(new_string_list)
print(new_string)

输出是

>>>that's my string, "   keep these spaces     "" as    well    as these    " reduce these"   keep these spaces too   " but not these 

【讨论】:

    【解决方案2】:

    我还有一些依赖于读取的每个字符的语义动作......这是某种伪 FSM 模型。

    您实际上可以实现 FSM:

    s = '''that's my     string, "   keep these spaces     "    but reduce these '''
    
    
    normal, quoted, eating = 0,1,2
    state = eating
    result = ''
    for ch in s:
      if (state, ch) == (eating, ' '):
        continue
      elif (state,ch) == (eating, '"'):
        result += ch
        state = quoted
      elif state == eating:
        result += ch
        state = normal
      elif (state, ch) == (quoted, '"'):
        result += ch
        state = normal
      elif state == quoted:
        result += ch
      elif (state,ch) == (normal, '"'):
        result += ch
        state = quoted
      elif (state,ch) == (normal, ' '):
        result += ch
        state = eating
      else: # state == normal
        result += ch
    
    print result
    

    或者,数据驱动的版本:

    actions = {
        'normal' : {
            ' ' : lambda x: ('eating', ' '),
            '"' : lambda x: ('quoted', '"'),
            None: lambda x: ('normal', x)
        },
        'eating' : {
            ' ' : lambda x: ('eating', ''),
            '"' : lambda x: ('quoted', '"'),
            None: lambda x: ('normal', x)
        },
        'quoted' : {
            '"' : lambda x: ('normal', '"'),
            '\\': lambda x: ('escaped', '\\'),
            None: lambda x: ('quoted', x)
        },
        'escaped' : {
            None: lambda x: ('quoted', x)
        }
    }
    
    def reduce(s):
        result = ''
        state = 'eating'
        for ch in s:
            state, ch = actions[state].get(ch, actions[state][None])(ch)
            result += ch
        return result
    
    s = '''that's my     string, "   keep these spaces     "    but reduce these '''
    print reduce(s)
    

    【讨论】:

    • 我已经开始这样做了:)
    • 效果很好,我只添加了 escape \" 序列检查,它应该足以满足我的目的。
    • 或者查看数据驱动版本以获得更明确的状态机,\" 转义。
    【解决方案3】:

    正如已经建议的那样,我会改用标准的shlex 模块,并进行一些调整:

    import shlex
    
    def reduce_spaces(s):
        lex = shlex.shlex(s)
        lex.quotes = '"'             # ignore single quotes
        lex.whitespace_split = True  # use only spaces to separate tokens
        tokens = iter(lex.get_token, lex.eof)  # exhaust the lexer
        return ' '.join(tokens)
    
    >>> s = '''that's my   string, "   keep these spaces     "   but reduce these '''
    >>> reduce_spaces(s)
    'that\'s my string, "   keep these spaces     " but reduce these'
    

    【讨论】:

      【解决方案4】:

      这有点小技巧,但您可以使用单线将其缩小到一个空间。

      one_space = lambda s : ' '.join([part for part in s.split(' ') if part]
      

      这将非空的部分连接在一起,即它们没有空格字符,由一个空格分隔。当然,更难的部分是用双引号将异常部分分开。在实际生产代码中,您还需要小心转义双引号等情况。但是假设您只有礼貌的案例,您也可以将它们分开。我假设在实际代码中你可能有不止一个双引号部分。

      您可以这样做,从您的字符串中创建一个列表,用双引号分隔,并且只使用一次偶数索引项,并直接附加我认为从工作一些示例中得到的偶数索引项。

      def fix_spaces(s):
        dbl_parts = s.split('"')
        normalize = lambda i: one_space(' ', dbl_parts[i]) if not i%2 else dbl_parts[i]
        return ' '.join([normalize(i) for i in range(len(dbl_parts))])
      

      【讨论】:

      • 正如我所说,我还需要为几个字符分配语义动作,所以我认为这种方法不够透明。
      【解决方案5】:

      使用shlex 将您的字符串解析为带引号和不带引号的部分,然后在不带引号的部分使用正则表达式将空白序列替换为一个空格。

      【讨论】:

      • 这实际上相当巧妙,但它不起作用。在他的示例和其他类似情况下,它应该在 that's 中的单引号上失败。我想知道标准库中是否有合适的解析器。编辑:看起来 shlex 可能可以配置为执行此操作。我留给你解决这个问题:)
      【解决方案6】:
      i = iter((i for i,char in enumerate(s) if char=='"'))
      zones = list(zip(*[i]*2))  # a list of all the "zones" where spaces should not be manipulated
      answer = []
      space = False
      for i,char in enumerate(s):
          if not any(zone[0] <= i <= zone[1] for zone in zones):
              if char.isspace():
                  if not space:
                      answer.append(char)
              else:
                  answer.append(char)
          else:
              answer.append(char)
          space = char.isspace()
      
      print(''.join(answer))
      

      还有输出:

      >>> s = '''that's my     string, "   keep these spaces     "    but reduce these '''
      >>> i = iter((i for i,char in enumerate(s) if char=='"'))
      >>> zones = list(zip(*[i]*2))
      >>> answer = []
      >>> space = False
      >>> for i,char in enumerate(s):
      ...     if not any(zone[0] <= i <= zone[1] for zone in zones):
      ...         if char.isspace():
      ...             if not space:
      ...                 answer.append(char)
      ...         else:
      ...             answer.append(char)
      ...     else:
      ...         answer.append(char)
      ...     space = char.isspace()
      ... 
      >>> print(''.join(answer))
      that's my string, "   keep these spaces     " but reduce these 
      

      【讨论】:

        猜你喜欢
        • 2021-11-20
        • 1970-01-01
        • 1970-01-01
        • 2023-01-24
        • 2014-06-29
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多