无法正确编码或解码字符串答案

【问题标题】：Unable to encode or decode the string properly无法正确编码或解码字符串
【发布时间】：2017-04-07 00:08:52
【问题描述】：

我尝试查看一堆 stackoverflow 示例。

使用的 Python 版本：Python 2.7.10

字符串 s 的输出看起来像

u'bh\xfcghi' where \xfc=ü

我正在从网页上阅读此内容。

在我通过 .encode('utf-8') 对字符串进行编码后，它看起来像

'bh\xc3\xbcghi' where \xc3\xbc=ü

预期输出应该是：

bhüghi

我什至尝试解码/编码（latin-1）、解码（utf-8）。

在 nfn neil 评论之后，我再次尝试了以下操作：

elem.text 输出：

('elem text:', u'bh\xfcghi\nMCI\n8 90 1 0 0 2 0 0 0 0 0 0 2 26 41.4 18.5 89 14.9')

elem 文本类型：

('elem text type:', <type 'unicode'>)

现在，我正在尝试打印它：

splitString = elem.text.encode('utf-8').decode("utf-8").split()
print("splitString: ", splitString[0])

SplitString[0] 输出：

u'bh\xfcghi'

现在如果我在拆分后打印整个字符串：

print("splitString: ", splitString)

SplitString 输出：

[u'bh\xfcghi', u'MCI', u'8', u'90', u'1', u'0', u'0', u'2', u'0', u'0', u'0', u'0', u'0', u'0', u'2', u'26', u'41.4', u'18.5', u'89', u'14.9']

完整代码在 pastebin 中：这里是A link

任何帮助将不胜感激。

【问题讨论】：

问题是发生了一些事情，使它不能修改字符串。这不是编码问题。
Pastebin link for the fullcode
我搞定了，`splitString = unicodedata.normalize('NFKD', elem.text).encode('ascii','ignore').split()`

【解决方案1】：

s = u'bh\xfcghi\nMCI\n8 90 1 0 0 2 0 0 0 0 0 0 2 26 41.4 18.5 89 14.9'
s = s.encode('utf-8')
xs = s.split(' ')
print(xs[0])

输出：

bhüghi
MCI
8

试试看；有用。仅在终端上键入时没有得到“预期”输出的原因是 Python 在您不使用 print 时使用 \x 转义码。

【讨论】：

【解决方案2】：

我通过使用 unicodedata 库让它工作了：

splitString = unicodedata.normalize('NFKD',
elem.text).encode('ascii','ignore').split()

【讨论】：