【问题标题】:Python3 Double Quote Parsing and DetectionPython3 双引号解析与检测
【发布时间】:2016-12-20 20:54:53
【问题描述】:

我正在尝试检测 .txt 文档中的某些文本字符。

我无法检测到: "" ,但我可以检测到 "。

我正在搜索“quote_test.txt”的假文本

"This is a test of the :: failure";"123";"Joe"
"This test should have failed on the above";"456";"Kathy ::"
"This is also a problem : they say";"789 "" test";"Jim"
"So is a , evidently when in a field";"543 :";"Mary"
"Will have to think about \"\" as a search";",475::";"Sue"
"Which is similar to the issue with " I think";"9463";"Toby"

脚本是:

import csv
import re

fail_text = re.compile('(:|::|,|""|")') # "" is not detected

with open("quote_test.txt") as fp:
    reader = csv.reader(fp, delimiter=';')

    for numx, line in enumerate(reader):
        for numy, column in enumerate(line):          
            m = re.search(fail_text, column)
            if m:
                print('Line {} Column {} has {} in {}'.format(
                    numx, numy, m.group(), column)
                )

输出看起来像这样,没有检测到文本中的“”:

Line 0 Column 0 has : in This is a test of the :: failure
Line 1 Column 2 has : in Kathy ::
Line 2 Column 0 has : in This is also a problem : they say
Line 2 Column 1 has " in 789 " test
Line 3 Column 0 has , in So is a , evidently when in a field
Line 3 Column 1 has : in 543 :
Line 4 Column 0 has " in Will have to think about \\" as a search"
Line 4 Column 1 has , in ,475::
Line 5 Column 0 has " in Which is similar to the issue with  I think"

最初我出于习惯逃避了text like this Stack solution,但没有奏效。

【问题讨论】:

  • 我认为问题在于如何在 re.compile 中处理双引号,因为:fail_text = re.compile('::') 可以找到 :: 但 fail_text = re.compile('""') 没有。
  • 如果你逃避双引号会发生什么? fail_text = re.compile('(:|::|,|\"\"|")')
  • re.compile('(:|::|\"\"|,|")') 和这个re.compile('\"\"') - 不接“”
  • 如果您查看输出,您的 RE 不会检测到“::”(双冒号)。如果双双引号或双冒号不重要,则只需使用 fail_text = re.compile('(::?|,|""?)')
  • 为什么你的文字中的双引号被转义了,而所有的单引号都没有?

标签: regex python-3.x text


【解决方案1】:

我可以通过将 csv.reader() 中的转义字符设置为 escapechar='\\' 并将正则表达式更改为 ""? 来解决此问题

def delimiter_check():

    fail_text = re.compile('(::?|This|,|""?)')

    with open("quote_test.txt") as fp:
        reader = csv.reader(fp, delimiter=';', quotechar='"', escapechar='\\')

        for numx, line in enumerate(reader):
            for numy, column in enumerate(line):          
                m = re.search(fail_text, column)
                if m:
                    print('Line {} Column {} has {} in {}'.format(
                        numx, numy, m.group(), column)
                    )

如果文本中的双引号是这样转义的:\"\",那么脚本应该会找到双引号。

Line 4 Column 0 has "" in Will have to think about "" as a search

【讨论】:

    猜你喜欢
    • 2018-09-05
    • 2021-11-11
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2011-12-12
    • 2014-03-27
    • 1970-01-01
    相关资源
    最近更新 更多