【发布时间】:2016-12-20 20:54:53
【问题描述】:
我正在尝试检测 .txt 文档中的某些文本字符。
我无法检测到: "" ,但我可以检测到 "。
我正在搜索“quote_test.txt”的假文本
"This is a test of the :: failure";"123";"Joe"
"This test should have failed on the above";"456";"Kathy ::"
"This is also a problem : they say";"789 "" test";"Jim"
"So is a , evidently when in a field";"543 :";"Mary"
"Will have to think about \"\" as a search";",475::";"Sue"
"Which is similar to the issue with " I think";"9463";"Toby"
脚本是:
import csv
import re
fail_text = re.compile('(:|::|,|""|")') # "" is not detected
with open("quote_test.txt") as fp:
reader = csv.reader(fp, delimiter=';')
for numx, line in enumerate(reader):
for numy, column in enumerate(line):
m = re.search(fail_text, column)
if m:
print('Line {} Column {} has {} in {}'.format(
numx, numy, m.group(), column)
)
输出看起来像这样,没有检测到文本中的“”:
Line 0 Column 0 has : in This is a test of the :: failure
Line 1 Column 2 has : in Kathy ::
Line 2 Column 0 has : in This is also a problem : they say
Line 2 Column 1 has " in 789 " test
Line 3 Column 0 has , in So is a , evidently when in a field
Line 3 Column 1 has : in 543 :
Line 4 Column 0 has " in Will have to think about \\" as a search"
Line 4 Column 1 has , in ,475::
Line 5 Column 0 has " in Which is similar to the issue with I think"
最初我出于习惯逃避了text like this Stack solution,但没有奏效。
【问题讨论】:
-
我认为问题在于如何在 re.compile 中处理双引号,因为:
fail_text = re.compile('::')可以找到 :: 但fail_text = re.compile('""')没有。 -
如果你逃避双引号会发生什么?
fail_text = re.compile('(:|::|,|\"\"|")') -
re.compile('(:|::|\"\"|,|")')和这个re.compile('\"\"')- 不接“” -
如果您查看输出,您的 RE 不会检测到“::”(双冒号)。如果双双引号或双冒号不重要,则只需使用 fail_text = re.compile('(::?|,|""?)')
-
为什么你的文字中的双引号被转义了,而所有的单引号都没有?
标签: regex python-3.x text