【发布时间】:2021-02-21 23:59:09
【问题描述】:
我正在尝试处理 CSV 格式的字符串中不匹配的双引号。
准确地说,
"It "does "not "make "sense", Well, "Does "it"
应该改正为
"It" "does" "not" "make" "sense", Well, "Does" "it"
所以基本上我想做的是
替换所有的'"'
- 前面没有行首或逗号 (and)
- 后面没有逗号或行尾
用'""'
为此,我使用以下正则表达式
(?<!^|,)"(?!,|$)
问题是当 Ruby 正则表达式引擎 (http://www.rubular.com/) 能够解析正则表达式时,python 正则表达式引擎 (https://pythex.org/, http://www.pyregex.com/) 会抛出以下错误
Invalid regular expression: look-behind requires fixed-width pattern
而使用 python 2.7.3 它会抛出
sre_constants.error: look-behind requires fixed-width pattern
谁能告诉我这里的python有什么烦恼?
================================================ ====================================
编辑:
在蒂姆的回复之后,我得到了以下多行字符串的输出
>>> str = """ "It "does "not "make "sense", Well, "Does "it"
... "It "does "not "make "sense", Well, "Does "it"
... "It "does "not "make "sense", Well, "Does "it"
... "It "does "not "make "sense", Well, "Does "it" """
>>> re.sub(r'\b\s*"(?!,|$)', '" "', str)
' "It" "does" "not" "make" "sense", Well, "Does" "it" "\n"It" "does" "not" "make" "sense", Well, "Does" "it" "\n"It" "does" "not" "make" "sense", Well, "Does" "it" "\n"It" "does" "not" "make" "sense", Well, "Does" "it" " '
在每一行的末尾,在 'it' 旁边添加了两个双引号。
所以我对正则表达式做了一个很小的改动来处理换行符。
re.sub(r'\b\s*"(?!,|$)', '" "', str,flags=re.MULTILINE)
但这给出了输出
>>> re.sub(r'\b\s*"(?!,|$)', '" "', str,flags=re.MULTILINE)
' "It" "does" "not" "make" "sense", Well, "Does" "it"\n... "It" "does" "not" "make" "sense", Well, "Does" "it"\n... "It" "does" "not" "make" "sense", Well, "Does" "it"\n... "It" "does" "not" "make" "sense", Well, "Does" "it" " '
最后一个 'it' 单独有两个双引号。
但我想知道为什么'$'行尾字符不能识别行已结束。
================================================ ====================================
最终的答案是
re.sub(r'\b\s*"(?!,|[ \t]*$)', '" "', str,flags=re.MULTILINE)
【问题讨论】:
-
Python lookbehind 断言的长度必须是恒定的,并且 (?