解析替换引号答案

【问题标题】：Parsing replace quotes解析替换引号
【发布时间】：2017-03-12 17:29:33
【问题描述】：

我正在尝试解析一个文本文件以在 python 中对其进行一些统计。为此，我想用标记替换一些标点符号。这种标记的一个示例是终止句子的所有标点符号（.!? 变为 <EndS>）。我设法使用正则表达式做到了这一点。现在我正在尝试解析引号。因此，我认为，我需要一种区分开头引号和结尾引号的方法。我正在逐行读取输入文件，但我无法保证引号会被平衡。

例如：

 "Death to the traitors!" cried the exasperated burghers.
 "Go along with you," growled the officer, "you always cry the same thing over again. It is very tiresome."

应该变成这样：

 [Open] Death to the traitors! [Close] cried the exasperated burghers.
 [Open] Go along with you, [Close] growled the officer, [Open] you always cry the same thing over again. It is very tiresome. [Close]

是否可以使用正则表达式来做到这一点？有没有更简单/更好的方法来做到这一点？

【问题讨论】：

标签： python regex parsing nlp quotes

【解决方案1】：

您可以使用sub方法（模块重新）：

import re

def replace_dbquote(render):
    return '[OPEN]' + render.group(0).replace('"', '') + '[CLOSE]'

string = '"Death to the traitors!" cried the exasperated burghers. "Go along with you", growled the officer.'
parser = re.sub('"[^"]*"', replace_dbquote, string)

print(parser)

https://docs.python.org/3.5/library/re.html#re.sub

【讨论】：