【问题标题】:Python single character cleanPython单字符清理
【发布时间】:2014-03-29 12:19:59
【问题描述】:

我想从文本中删除所有单字符单词。

例如:我想清除下面文本中的所有粗体字符。 (a?d* 等),重新调整清理后的文本。

Lorem Ipsum 只是印刷和排版行业的一个虚拟文本|。自 1500 年代以来,Lorem Ipsum 一直是行业的标准虚拟文本,当时一位不知名的打印机采用了一种类型的厨房并将其 d 加扰以制作 * 类型的样本书。它不仅经历了五个世纪,而且经历了[电子排版的飞跃,基本保持不变。

【问题讨论】:

  • 标点符号前后的字符呢? End of a sentence.a Start of a new?角色周围的空白应该怎么处理?
  • 前后一个长度的字符都有空格
  • 但是当你删除一个字符时,它周围的空格是否也应该被删除?

标签: python


【解决方案1】:

使用正则表达式:

re.sub(r'((?:^|(?<=\s))\S\s|\s\S(?:$|(?=\s)))', '', inputtext)

这会删除任何 一个 非空白字符,这些字符要么位于文本的开头,要么以空格开头,后跟一个空白字符(也被删除),或 em> 一个空白字符,后跟 一个 非空白字符,位于文本末尾或后跟空白。

这可以确保一个字符周围的空白也被正确删除。

演示:

>>> import re
>>> inputtext = '''\
... Lorem Ipsum is simply a dummy ? text | of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it d to make * type specimen book. It has survived not only five centuries, but also the leap into [ electronic typesetting, remaining essentially unchanged.
... '''
>>> re.sub(r'((?:^|(?<=\s))\S\s|\s\S(?:$|(?=\s)))', '', inputtext)
"Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took galley of type and scrambled it to make type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.\n"

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2010-10-30
    • 2014-02-21
    • 2011-05-02
    • 2015-01-24
    • 2011-06-22
    • 1970-01-01
    • 1970-01-01
    • 2023-04-06
    相关资源
    最近更新 更多