【问题标题】:How do I remove punctuation, digits and spaces from a file in python3 [duplicate]如何从python3中的文件中删除标点符号、数字和空格[重复]
【发布时间】:2022-01-25 00:06:00
【问题描述】:

如何在 python3 中删除文件中的标点符号、数字和空格。

fname = input("Enter the name of the file: ")
fh = open(fname)
for line in fh:
    line = line.strip()

【问题讨论】:

  • 标点符号是一个广义的术语,你是指所有的特殊字符吗?

标签: python python-3.x


【解决方案1】:

打印除标点符号、数字和空格以外的所有字符:

from string import whitespace, punctuation, digits

fname = input("Enter the name of the file: ")
with open(fname) as f:
    for line in f:
        print(''.join(filter(lambda c: c not in whitespace + digits + punctuation, line)),
              end="")

理解:

from string import whitespace, punctuation, digits

fname = input("Enter the name of the file: ")
with open(fname) as f:
    for line in f:
        print(
            ''.join(c for c in line if c not in whitespace + punctuation + digits),
            end="")

如果你想用新内容替换文件,这里是代码:

from pathlib import Path
from string import whitespace, punctuation, digits

file_name = Path(input("Enter the name of the file: "))
file_name.write_text(''.join(c for c in file_name.read_text() if
                             c not in whitespace + punctuation + digits))

你应该看看Path,它很有用! 此外,string 模块包含许多此类操作的快捷方式。

这里有一个关于所做工作的更详细的代码。每一步都分解明晰:

import pathlib
import string

# This concatenates the three strings together. Each string is from the string
# module. The result is "!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~0123456789"
CHARACTER_TO_DELETE = string.whitespace + string.punctuation + string.digits


def character_not_to_delete(character):
    """
    This function checks if the character is not in the character_to_delete.
    """
    return character not in CHARACTER_TO_DELETE


def clean_file():
    """
    This function recreate the content of the file without any whitespace 
    character, punctuation or digits.
    """
    file_name = pathlib.Path(input("Enter the name of the file: "))

    # This open the file, return the content and close the file, thanks to Path
    file_content = file_name.read_text()

    # Use a comprehension list to create a list of characters that are not in
    # the character_to_delete
    create_list_of_allowed_character = [c for c in file_content
                                        if character_not_to_delete(c)]

    # Concatenate the list of characters to create a string
    new_content = ''.join(create_list_of_allowed_character)

    # This open the file, write the new content and close the file
    file_name.write_text(new_content)

if __name__ == "__main__":
    clean_file()

【讨论】:

  • 您能否逐条告诉我代码在做什么。我确实明白它正在做的是删除所有特殊字符和数字(我已经尝试过),但重要的是我想理解代码。所以你能帮助我还是只提供一个链接。真的很有帮助……
  • 我不会详细说明路径是如何工作的,但如果不太清楚,我会解释逻辑:)
  • @chaghtaiTrkan 你觉得这个编辑怎么样?
  • 非常感谢
【解决方案2】:

我们将读取文件,使用正则表达式删除不必要的字符并将其写回:

with open(fname, "r") as f:
    content = f.read()
    c = re.sub("[0-9,.:;?!\"' ]", "", content)

with open(fname, "w") as f:
    f.write(c)

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2013-09-05
    • 2020-08-02
    • 1970-01-01
    • 2019-08-25
    • 1970-01-01
    • 2017-12-28
    • 2011-08-16
    相关资源
    最近更新 更多