如何在python中用空格替换所有这些特殊字符？答案

【问题标题】：How to replace all those Special Characters with white spaces in python?如何在python中用空格替换所有这些特殊字符？
【发布时间】：2012-01-10 09:13:35
【问题描述】：

如何在 python 中用空格替换所有这些特殊字符？

我有一个公司名称列表。 . .

例如：-[myfiles.txt]

我的公司.INC

老酒列兵

万事通

“顶点实验室”

“印度-新公司”

印美私人有限公司

这里，按照上面的例子。 . .我需要将文件myfiles.txt 中的所有特殊字符[-,",/,.] 替换为单个空格并保存到另一个文本文件myfiles1.txt 中。

谁能帮帮我？

【问题讨论】：

每个角色都有其独特之处。
没有非特殊字符。如果有的话，就会有一个最小的非特殊字符。这会让它变得特别。

标签： python replace special-characters whitespace text-files

【解决方案1】：

假设您要更改所有非字母数字，您可以在命令行上执行此操作：

cat foo.txt | sed "s/[^A-Za-z0-99]/ /g" > bar.txt

或者在 Python 中使用 re 模块：

import re
original_string = open('foo.txt').read()
new_string = re.sub('[^a-zA-Z0-9\n\.]', ' ', original_string)
open('bar.txt', 'w').write(new_string)

【讨论】：

【解决方案2】：

import string

specials = '-"/.' #etc
trans = string.maketrans(specials, ' '*len(specials))
#for line in file
cleanline = line.translate(trans)

例如

>>> line = "Indo-American pvt/ltd"
>>> line.translate(trans)
'Indo American pvt ltd'

【讨论】：

这太棒了！！！！但我希望它自动保存到文本文件中。 . . .就像必须读取 myfile.txt 中的每一行，并在替换它们后将它们保存到 myfiles1.txt 中
然后只需在转换后添加一行即可！
@Yeshu91 如果 f 是您的文件句柄（例如 f=open('cleanfile.txt', 'w') 则只需在末尾添加 f.write(cleanline)。

【解决方案3】：

import re
strs = "how much for the maple syrup? $20.99? That's ricidulous!!!"
strs = re.sub(r'[?|$|.|!]',r'',strs) #for remove particular special char
strs = re.sub(r'[^a-zA-Z0-9 ]',r'',strs) #for remove all characters
strs=''.join(c if c not in map(str,range(0,10)) else '' for c in strs) #for remove numbers
strs = re.sub('  ',' ',strs) #for remove extra spaces
print(strs) 

Ans: how much for the maple syrup Thats ricidulous

【讨论】：

【解决方案4】：

虽然 maketrans 是最快的方法，但我从不记得语法。由于速度很少成为问题，而且我知道正则表达式，所以我倾向于这样做：

>>> line = "-[myfiles.txt] MY company.INC"
>>> import re
>>> re.sub(r'[^a-zA-Z0-9]', ' ',line)
'  myfiles txt  MY company INC'

这有一个额外的好处，那就是声明你接受的角色而不是你拒绝的角色，在这种情况下感觉更容易。

当然，如果您使用非 ASCII 字符，您将不得不返回删除您拒绝的字符。如果只有标点符号，你可以这样做：

>>> import string
>>> chars = re.escape(string.punctuation)
>>> re.sub(r'['+chars+']', ' ',line)
'  myfiles txt  MY company INC'

但你会注意到

【讨论】：

【解决方案5】：

起初我想提供一个 string.maketrans/translate 示例，但也许您正在使用一些 utf-8 编码的字符串，并且 ord() 排序的翻译表会吹到你的脸上，所以我想到了另一种解决方案：

conversion = '-"/.'
text =  f.read()
newtext = ''
for c in text:
    newtext += ' ' if c in conversion else c

这不是最快的方法，但很容易掌握和修改。

因此，如果您的文本不是 ascii，您可以将 conversion 和文本字符串解码为 unicode，然后以您想要的任何编码重新编码。

【讨论】：