Python在嵌套for循环中遍历字符串答案

【问题标题】：Python loop through string in nested for loopsPython在嵌套for循环中遍历字符串
【发布时间】：2014-01-29 23:10:44
【问题描述】：

我只是想知道，我正在尝试进行非常简单的文本处理或还原。我想用一个替换所有空格（" " 中没有这些空格）。我也有一些依赖于读取的每个字符的语义操作，所以这就是我不想使用任何正则表达式的原因。这是某种伪 FSM 模型。

所以这是交易：

s = '''that's my     string, "   keep these spaces     "    but reduce these '''

期望的输出：

that's my string, "   keep these spaces    " but reduce these

我想做的是这样的：（我没有提到'"'的情况，以保持示例简单）

out = ""
for i in range(len(s)):

  if s[i].isspace():
    out += ' '
    while s[i].isspace():
      i += 1

  else:
    out += s[i]

我不太明白在这种情况下如何创建或共享范围。

谢谢你的建议。

【问题讨论】：

什么是变量line和lineCpy？
问题是，一旦你跳过了while循环中的所有括号，i变量将在最后一个满足s[i].isspace()条件的“空格”之后取下一个值...所以你不会删除所有的括号，你只会再次遍历它们......
啊，抱歉，我错过了它们，它们都是s 字符串，我猜我是盲人。

标签： python string loops python-3.x nested-loops

【解决方案1】：

有点担心这个解决方案是否可读。修改了字符串 OP 建议在给定的字符串中包含多个双引号对。

s = '''that's my     string,   "   keep these spaces     "" as    well    as these    "    reduce these"   keep these spaces too   "   but not these  '''
s_split = s.split('"')

# The substrings in odd positions of list s_split should retain their spaces.
# These elements have however lost their double quotes during .split('"'),
# so add them for new string. For the substrings in even postions, remove 
# the multiple spaces in between by splitting them again using .split() 
# and joining them with a single space. However this will not conserve 
# leading and trailing spaces. In order conserve them, add a dummy 
# character (in this case '-') at the start and end of the substring before 
# the split. Remove the dummy bits after the split.
#
# Finally join the elements in new_string_list to create the desired string.

new_string_list = ['"' + x + '"' if i%2 == 1
                   else ' '.join(('-' + x + '-').split())[1:-1]                   
                   for i,x in enumerate(s_split)]
new_string = ''.join(new_string_list)
print(new_string)

输出是

>>>that's my string, "   keep these spaces     "" as    well    as these    " reduce these"   keep these spaces too   " but not these

【讨论】：

【解决方案2】：

我还有一些依赖于读取的每个字符的语义动作......这是某种伪 FSM 模型。

您实际上可以实现 FSM：

s = '''that's my     string, "   keep these spaces     "    but reduce these '''


normal, quoted, eating = 0,1,2
state = eating
result = ''
for ch in s:
  if (state, ch) == (eating, ' '):
    continue
  elif (state,ch) == (eating, '"'):
    result += ch
    state = quoted
  elif state == eating:
    result += ch
    state = normal
  elif (state, ch) == (quoted, '"'):
    result += ch
    state = normal
  elif state == quoted:
    result += ch
  elif (state,ch) == (normal, '"'):
    result += ch
    state = quoted
  elif (state,ch) == (normal, ' '):
    result += ch
    state = eating
  else: # state == normal
    result += ch

print result

或者，数据驱动的版本：

actions = {
    'normal' : {
        ' ' : lambda x: ('eating', ' '),
        '"' : lambda x: ('quoted', '"'),
        None: lambda x: ('normal', x)
    },
    'eating' : {
        ' ' : lambda x: ('eating', ''),
        '"' : lambda x: ('quoted', '"'),
        None: lambda x: ('normal', x)
    },
    'quoted' : {
        '"' : lambda x: ('normal', '"'),
        '\\': lambda x: ('escaped', '\\'),
        None: lambda x: ('quoted', x)
    },
    'escaped' : {
        None: lambda x: ('quoted', x)
    }
}

def reduce(s):
    result = ''
    state = 'eating'
    for ch in s:
        state, ch = actions[state].get(ch, actions[state][None])(ch)
        result += ch
    return result

s = '''that's my     string, "   keep these spaces     "    but reduce these '''
print reduce(s)

【讨论】：

我已经开始这样做了:)
效果很好，我只添加了 escape \" 序列检查，它应该足以满足我的目的。
或者查看数据驱动版本以获得更明确的状态机，\" 转义。

【解决方案3】：

正如已经建议的那样，我会改用标准的shlex 模块，并进行一些调整：

import shlex

def reduce_spaces(s):
    lex = shlex.shlex(s)
    lex.quotes = '"'             # ignore single quotes
    lex.whitespace_split = True  # use only spaces to separate tokens
    tokens = iter(lex.get_token, lex.eof)  # exhaust the lexer
    return ' '.join(tokens)

>>> s = '''that's my   string, "   keep these spaces     "   but reduce these '''
>>> reduce_spaces(s)
'that\'s my string, "   keep these spaces     " but reduce these'

【讨论】：

【解决方案4】：

这有点小技巧，但您可以使用单线将其缩小到一个空间。

one_space = lambda s : ' '.join([part for part in s.split(' ') if part]

这将非空的部分连接在一起，即它们没有空格字符，由一个空格分隔。当然，更难的部分是用双引号将异常部分分开。在实际生产代码中，您还需要小心转义双引号等情况。但是假设您只有礼貌的案例，您也可以将它们分开。我假设在实际代码中你可能有不止一个双引号部分。

您可以这样做，从您的字符串中创建一个列表，用双引号分隔，并且只使用一次偶数索引项，并直接附加我认为从工作一些示例中得到的偶数索引项。

def fix_spaces(s):
  dbl_parts = s.split('"')
  normalize = lambda i: one_space(' ', dbl_parts[i]) if not i%2 else dbl_parts[i]
  return ' '.join([normalize(i) for i in range(len(dbl_parts))])

【讨论】：

正如我所说，我还需要为几个字符分配语义动作，所以我认为这种方法不够透明。

【解决方案5】：

使用shlex 将您的字符串解析为带引号和不带引号的部分，然后在不带引号的部分使用正则表达式将空白序列替换为一个空格。

【讨论】：

这实际上相当巧妙，但它不起作用。在他的示例和其他类似情况下，它应该在 that's 中的单引号上失败。我想知道标准库中是否有合适的解析器。编辑：看起来 shlex 可能可以配置为执行此操作。我留给你解决这个问题:)

【解决方案6】：

i = iter((i for i,char in enumerate(s) if char=='"'))
zones = list(zip(*[i]*2))  # a list of all the "zones" where spaces should not be manipulated
answer = []
space = False
for i,char in enumerate(s):
    if not any(zone[0] <= i <= zone[1] for zone in zones):
        if char.isspace():
            if not space:
                answer.append(char)
        else:
            answer.append(char)
    else:
        answer.append(char)
    space = char.isspace()

print(''.join(answer))

还有输出：

>>> s = '''that's my     string, "   keep these spaces     "    but reduce these '''
>>> i = iter((i for i,char in enumerate(s) if char=='"'))
>>> zones = list(zip(*[i]*2))
>>> answer = []
>>> space = False
>>> for i,char in enumerate(s):
...     if not any(zone[0] <= i <= zone[1] for zone in zones):
...         if char.isspace():
...             if not space:
...                 answer.append(char)
...         else:
...             answer.append(char)
...     else:
...         answer.append(char)
...     space = char.isspace()
... 
>>> print(''.join(answer))
that's my string, "   keep these spaces     " but reduce these

【讨论】：