ISO 8859-1 文件名未解码答案

【问题标题】：ISO 8859-1 filename not decodingISO 8859-1 文件名未解码
【发布时间】：2012-07-23 21:02:53
【问题描述】：

我在 python milter 中从 MIME 消息中提取文件，并且遇到了这样命名的文件的问题：

=?ISO-8859-1?Q?Certificado=5FZonificaci=F3n=5F2010=2Epdf?=

我似乎无法将此名称解码为 UTF。为了解决之前的 ISO-8859-1 问题，我开始将所有文件名传递给这个函数：

def unicodeConvert(self, fname):
    normalized = False

    while normalized == False:
        try:
            fname  = unicodedata.normalize('NFKD', unicode(fname, 'utf-8')).encode('ascii', 'ignore')
            normalized = True
        except UnicodeDecodeError:
            fname = fname.decode('iso-8859-1')#.encode('utf-8')
            normalized = True
        except UnicodeError:
            fname = unicode(fname.content.strip(codecs.BOM_UTF8), 'utf-8')
            normalized = True
        except TypeError:
            fname = fname.encode('utf-8')

    return fname

在我得到这个文件名之前一直有效。

想法一如既往地受到赞赏。

【问题讨论】：

这是一个 RFC 2047 encoded-word。出现在Content-Disposition 参数中是完全不正确的，但 Outlook 无论如何都会这样做，因为它的技术问题非常糟糕。

标签： python unicode mime iso latin1

【解决方案1】：

您的字符串使用Quoted-printable MIME 标头格式进行编码。 email.header module 为您处理这个问题：

>>> from email.header import decode_header
>>> try:
...     string_type = unicode  # Python 2
... except NameError:
...     string_type = str      # Python 3
...
>>> for part in decode_header('=?ISO-8859-1?Q?Certificado=5FZonificaci=F3n=5F2010=2Epdf?='):
...     decoded = string_type(*part)
...     print(decoded)
...
Certificado_Zonificación_2010.pdf

【讨论】：