如何使用 \u 转义码对 Python 3 字符串进行编码？答案

【问题标题】：How to encode Python 3 string using \u escape code?如何使用 \u 转义码对 Python 3 字符串进行编码？
【发布时间】：2015-11-23 16:30:48
【问题描述】：

在 Python 3 中，假设我有

>>> thai_string = 'สีเ'

使用encode 给出

>>> thai_string.encode('utf-8')
b'\xe0\xb8\xaa\xe0\xb8\xb5'

我的问题：如何让encode() 使用\u 而不是\x 返回bytes 序列？我怎样才能decode 他们回到 Python 3 str 类型？

我尝试使用 ascii 内置函数，它给出了

>>> ascii(thai_string)
"'\\u0e2a\\u0e35'"

但这似乎不太正确，因为我无法将其解码回来以获取thai_string。

文档说\u 仅用于字符串文字，但我不确定这意味着什么。这是否暗示我的问题有一个有缺陷的前提？

【问题讨论】：

.decode('utf-8') 怎么样？ Python 中的字符串难道不是 unicode 吗？
@Zizouz212，thai_string 和 ascii(thai_string) 都没有 decode 方法，thai_string.encode('utf-8').decode('utf-8') 将我带回到我开始的地方，thai_string，这不是所需的输出。
与转义序列\u：docs.python.org/3/reference/lexical_analysis.html和docs.python.org/3/library/codecs.html#encodings-and-unicode相关的Python文档
相关：stackoverflow.com/q/1347791/1959808
这能回答你的问题吗？ How to work with surrogate pairs in Python?

【解决方案1】：

你可以使用unicode_escape:

>>> thai_string.encode('unicode_escape')
b'\\u0e2a\\u0e35\\u0e40'

注意encode()总是会返回一个字节串（bytes）和unicode_escape编码is intended to：

在 Python 源代码中生成一个适合作为 Unicode 文字的字符串

【讨论】：