在python中大约将unicode字符串转换为ascii字符串答案

【问题标题】：Approximately converting unicode string to ascii string in python在python中大约将unicode字符串转换为ascii字符串
【发布时间】：2011-12-26 14:34:11
【问题描述】：

不知道这是否微不足道，但我需要将 unicode 字符串转换为 ascii 字符串，而且我不希望周围有所有这些转义字符。我的意思是，是否可以“近似”转换为一些非常相似的 ascii 字符？

例如：Gavin O'Connor 被转换为 Gavin O\x92Connor，但我真的希望它只是转换为 Gavin O'Connor。这可能吗？有没有人写了一些工具来做，还是我必须手动替换所有字符？

非常感谢！马可

【问题讨论】：

看到这个 [stackoverflow.com/questions/816285/…
您想要达到的目标并不理想。您可能不得不一直添加新的替代品。如果您能解释为什么需要这样做以及为什么必须使用 ASCII 而不是 Unicode，那就太好了。
@sorin：如果您使用的实用程序已经可以替换所有 Unicode 字符，则不会。

标签： python string unicode ascii

【解决方案1】：

有一种技术可以去除字符的重音，但需要直接替换其他字符。查看这篇文章：http://effbot.org/zone/unicode-convert.htm

【讨论】：

【解决方案2】：

import unicodedata

unicode_string = u"Gavin O’Connor"
print unicodedata.normalize('NFKD', unicode_string).encode('ascii','ignore')

输出：

加文·奥康纳

这是描述规范化形式的文档：http://unicode.org/reports/tr15/

【讨论】：

这只是从示例输入字符串中删除撇号。 OP 正在寻找一种方法将其替换为“足够接近”的 ascii 单引号字符。
嗯，在我的机器上它给出了上面的输出，但是当在其他地方尝试同样的事情时，撇号被删除了......奇怪。
使用我的 python 2.6.6、unicodedata.normalize('NFKD', u'Gavin O\u2019Connor') == u'Gavin O\u2019Connor' 和 u'Gavin O\u2019Connor'.encode('ascii', 'ignore') == 'Gavin OConnor'。我对您链接到的标准感到困惑，所以我无法判断这是 unicodedata.normalize 的错误，还是正确的行为。
在 2.6.5 unicodedata.normalize('NFKD', u"Gavin O’Connor").encode('ascii','ignore') 给我"Gavin O'Connor"

【解决方案3】：

使用Unidecode 包来音译字符串。

>>> import unidecode
>>> unidecode.unidecode(u'Gavin O’Connor')
"Gavin O'Connor"

【讨论】：

刚安装.. 但是.. >>> import unidecode >>> unidecode.unidecode(u'Gavin O'Connor') >>> "Gavin OConnor"
表示’是一个Unicode字符，没有ASCII等价物。 ’ 不是 '，至少根据 Python 是这样。您可能想要制作一个包含此类特殊字符的字典并存储外观相似的 ASCII 字符。然后，您可以将 Unicode 字符替换为相应的 ASCII 字符。

【解决方案4】：

b = str(a.encode('utf-8').decode('ascii', 'ignore'))

应该可以正常工作。

【讨论】：

它不起作用。当我尝试它时，它只是删除了所有非 ASCII 字符。

【解决方案5】：

尝试简单的字符替换

str1 = "“I am the greatest”, said Gavin O’Connor"
print(str1)
print(str1.replace("’", "'").replace("“","\"").replace("”","\""))

PS：如果您收到error，请将# -*- coding: utf-8 -*- 添加到您的.py 文件的顶部

【讨论】：

还有许多其他常用的 Unicode 字符具有相似的 ASCII 版本，例如各种破折号和连字符。手动完成这一切太难了。