从 python2 转换为 python3 时处理 encode()答案

【问题标题】：Handling encode() when converting from python2 to python3从 python2 转换为 python3 时处理 encode()
【发布时间】：2019-06-06 19:35:36
【问题描述】：

我正在将一个大型项目从python2 转换为python3（不需要python2 向后兼容）。

在测试转换时，我发现某些字符串被转换为bytes 对象时遇到了问题，这导致了问题。我将其追溯到以下方法，该方法在许多地方都被调用：

def custom_format(val):
    return val.encode('utf8').strip().upper()

在python2:

custom_format(u'\xa0')
# '\xc2\xa0'
custom_format('bar')
# `BAR`

在python3:

custom_format('\xa0')
# b'\xc2\xa0'
custom_format('bar')
# b`BAR`

这是一个问题的原因是因为在某些时候custom_format 的输出是要使用format() 插入到SQL 模板字符串中，但'foo = {}'.format(b'bar') == "foo = b'BAR'" 会破坏@987654334 的潜力@ 语法。

简单地删除encode('utf8') 部分将确保custom_format('bar') 正确返回'BAR'，但现在custom_format('\xa0') 返回'\xa0' 而不是python2 版本的'\xc2\xa0'。（虽然我对 unicode 了解得不够多，不知道这是否是一件坏事）

在不弄乱代码的SQL 或format() 部分的情况下，如何确保python2 版本的预期行为在python3 版本中得到体现？是否像删除encode('utf8') 一样简单，还是会导致意外冲突？

【问题讨论】：

当我删除编码并在 py3 中的 '\xa0' 上使用该函数时，它返回了 '' 而不是 '\xa0'。
不确定这是否是最佳解决方案：str(val.encode('utf8').strip().upper())[2:-1]

标签： python string unicode encode

【解决方案1】：

如果您的意图是确保所有传入的字符串，无论是strs 还是bytes，都被转换为bytes，那么您必须保留encode，因为Python3 使用str 而不是bytes （Python2 就是这种情况）作为本机字符串类型。 encode 将str 转换为bytes。

如果您的目的是确保查询看起来正确。然后你可以删除encode，让Python3为你处理。

【讨论】：