用python编码元组列表？答案

【问题标题】：encoding a list of tuples with python?用python编码元组列表？
【发布时间】：2015-02-27 04:00:33
【问题描述】：

我正在从一个目录中读取一个 utf-8 文本文件，然后我将读取的文本插入到一个列表中，我得到了一些像这样的元组：

l = [('mucho','fácil'),...,('yo','hola')]

当我在控制台上打印它时，我有以下内容：

print l

('mucho','f\xc3\xa1cil'),...,('yo','hola')

所以我尝试了以下方法：

fixing_l = [x.encode('utf-8') for x in l]

当我尝试打印它时，我得到了这个异常：

AttributeError: 'tuple' object has no attribute 'encode'

我怎样才能编码和修复字符串并得到这样的东西？：

('mucho','fácil'),...,('yo','hola')

【问题讨论】：

如果你想打电话给encode，那就不行了；您在元组中的内容是 <type 'str'>
好的，我将其更改为解码
当您print 一个容器时，您总是，不可避免地会看到容器项目的repr。因此，没有方法可以创建一个元组列表，在print 上显示项目的str 而不是它们的repr。如果需要，您需要一个自定义的容器类！
在 Python 3 中，字符串的 repr 恰好是您在这里想要的；在 Python 2.7 中，当字符串包含非 ASCII 字符时，情况并非如此。这就是为什么你需要一些自定义技巧......如果有的话（因为__repr__ has 仅在 Python 2.7 中返回 ASCII 字符）。

标签： python python-2.7 encoding io character-encoding

【解决方案1】：

我认为你的意思是解码

l = [('mucho','f\xc3\xa1cil'),...,('yo','hola')]
decoded = [[word.decode("utf8") for word in sets] for sets in l]


for words in decoded:
    print u" ".join(words)

print 'f\xc3\xa1cil'.decode("utf8")

如果您打印它，您应该会看到正确的字符串。

由于您最初有一个普通的字节字符串，因此您需要 decode 它返回对象的 unicode 表示...在上面的情况下，u"\xe1" 实际上只是 <utf8 bytestring>"\xc3\xa1"，而这实际上只是 @ 987654325@

【讨论】：

现在我有f\xe1cil而不是fácil。我应该尝试其他编码吗？我在 OSX 中，我用终端查看文件的编码，它说是 utf8。
@newWithPython 你的意思是在你的 txt 文件中还有另一个词 f\xe1cil？如果是这样，我猜你的文件包含不止一种编码的文本
不，首先是：f\xc3\xa1cil 然后用这个解决方案：f\xe1cil。
不，它变成了u"f\xe1cil"，它只是字符串的 unicode 表示。 unicode 是 python 用来表示非 ascii 字符的......粗略的有几种编码，但这只是表示

【解决方案2】：

在python3中你可以使用：

res = [tuple(map(lambda x: x.encode(encoding), tup)) for tup in list_tuples]

例子：

list_tuples = [('mucho','fácil'), ('\u2019', 't')]
res = [tuple(map(lambda x: x.encode('utf-8'), tup)) for tup in list_tuples]

结果：

[(b'mucho', b'f\xc3\xa1cil'), (b'\xe2\x80\x99', b't')]

【讨论】：