打印编码字符串答案

【问题标题】：Print encoded string打印编码字符串
【发布时间】：2014-06-24 18:03:50
【问题描述】：

我正在使用 python 开发自己的 mp3 解码器，但我在解码 ID3 标签时有点卡住了。我不想使用 mutagen 或 eyeD3 等现有库，但要遵循 ID3v2 规范。

问题是帧数据以某种我无法打印的格式编码，使用调试器我看到值“Hideaway”，但它前面有一些奇怪的字符，如您在此处看到的：

'data': '\\x00Hideaway'

我有以下问题：那是一种什么样的编码？如何解码和打印该字符串？你认为其他 mp3 文件在 ID3 标签中使用不同的编码吗？

顺便说一句，我在文件顶部使用了 utf-8 声明

# -*- coding: utf-8 -*-

我正在使用 python 中的普通 I/O 方法读取文件 (read())

【问题讨论】：

标签： python encoding mp3

【解决方案1】：

字符\\x00 表示在H 之前有一个值为0 的单个字节。因此，您的字符串如下所示：

Zero - H - i - d - e ...

通常字符串中包含字母或数字，而不是零。也许这种用法是 ID3v2 特有的？

考虑到 IDC3v2 标准 (http://id3.org/id3v2.4.0-structure)，我们看到它是：

Frames that allow different types of text encoding contains a text
encoding description byte. Possible encodings:

 $00   ISO-8859-1 [ISO-8859-1]. Terminated with $00.
 $01   UTF-16 [UTF-16] encoded Unicode [UNICODE] with BOM. All
       strings in the same frame SHALL have the same byteorder.
       Terminated with $00 00.
 $02   UTF-16BE [UTF-16] encoded Unicode [UNICODE] without BOM.
       Terminated with $00 00.
 $03   UTF-8 [UTF-8] encoded Unicode [UNICODE]. Terminated with $00.

所以，我们看到零字节表示 ISO-8859-1 编码，直到下一个零字节。

您的程序可能会这样处理：

title = fp.read(number_of_bytes)
if(title[0] == '\x00')
    title = title[1:].decode('iso8859-1')
elif(title[0] == ... something else ...)
    title = title[1:].decode('some-other-encoding')
...

【讨论】：