【发布时间】:2015-06-09 08:19:38
【问题描述】:
我正在使用 Nltk 的 punkt 分词器将段落分词成句子,但在某些情况下,例如下面的示例,分词器无法识别句子,因为句点后跟数字。我想使用正则表达式识别这些场景并将'.1,7,9' 替换为'. 1,7,9',即在引用和句点之间添加空格。
Ex1. `This is a random sentence.1,7,9 This is a sentence followed by it.`
Ex2. I love football.1,7,24`I also like cricket.
Ex3. ESD for undifferentiated cancers.[1][7]`Cancers can be treatable.
预期输出:
EX1. This is a random sentence.
1,7,9 This is a sentence followed by it.
Ex2. I love football.
ESD for undifferentiated cancers.1,7
Ex3. ESD for undifferentiated cancers.1,7
[1][7]`Cancers can be treatable.
谢谢。
【问题讨论】:
-
第三种情况的预期输出是什么?
-
您的第三个示例和预期的输出不同。
-
对不起我的错。我已经更新了。