【发布时间】:2019-06-25 17:09:28
【问题描述】:
我试图从头开始编写 torrent 应用程序只是为了学习。所以在阅读 wiki 几个小时后,我编写了一些用于解码使用 'Bencoding' "https://en.wikipedia.org/wiki/Bencode" 的 torrent 文件的代码。但不幸的是,我没有注意到字节字符串和 python 字符串。我的代码适用于像 torrent 数据这样的 python 字符串,但是当我传递 torrent 字节数据时,我得到了编码错误。
我尝试了“ open(file, 'rb',encoding='utf-8', errors='ignore') ”。它确实将字节字符串更改为 python 字符串。我还在 stakoverflow 上尝试了所有可用的答案。但是一些数据作为错误丢失了,所以我无法正确解码 torrent 数据。请原谅我乱七八糟的编码,请帮助...我还阅读了 bencoder 库,它直接在字节字符串上工作,所以如果有任何方法我不必重新编写代码,请...
with open(torrent_file1, 'rb') as _file:
data = _file.read()
def int_decode(meta, cur):
print('inside int_decode function')
cursor = cur
start = cursor + '1'
end = start
while meta[end] != 'e':
end += 1
value = int(meta[start:end])
cursor = end + 1
print(value, cursor)
return value, cursor
def chr_decode(meta, cur):
print('inside chr_decode function')
cursor = cur
start = cursor
end = start
while meta[end] != ':':
end += 1
chr_len = int(meta[start:end])
chr_start = end + 1
chr_end = chr_start + chr_len
value = meta[chr_start:chr_end]
cursor = chr_end
print(value, cursor)
return value, cursor
def list_decode(meta, cur):
print('inside the list decoding')
cursor = cur+1
new_list = list()
while cursor < (len(meta)):
if meta[cursor] == 'i':
item, cursor = int_decode(meta, cursor)
new_list.append(item)
elif meta[cursor].isdigit():
item, cursor = chr_decode(meta, cursor)
new_list.append(item)
elif meta[cursor] == 'e':
print('list is ended')
cursor += 1
break
return (new_list,cursor)
def dict_decode(meta, cur=0, key_=False, key_val=None):
if meta[cur] == 'd':
print('dict found')
new_dict = dict()
key = key_
key_value = key_val
cursor = cur + 1
while cursor < (len(meta)):
if meta[cursor] == 'i':
value, cursor = int_decode(meta, cursor)
if not key:
key = True
key_value = value
else:
new_dict[key_value] = value
key = False
elif meta[cursor].isdigit():
value, cursor = chr_decode(meta, cursor)
if not key:
key = True
key_value = value
else:
new_dict[key_value] = value
key = False
elif meta[cursor] == 'l':
lists, cursor = list_decode(meta, cursor)
if key:
new_dict[key_value] = lists
key = False
else:
print('list cannot be used as key')
elif meta[cursor] == 'd':
dicts, cursor = dict_decode(meta, cursor)
if not key:
key=True
key_value = dicts
else:
new_dict[key_value] = dicts
key=False
elif meta[cursor] == 'e':
print('dict is ended')
cursor += 1
break
return (new_dict,cursor)
test = 'di323e4:spami23e4:spam5:helloi23e4:spami232ei232eli32e4:doneei23eli1ei2ei3e4:harmee'
test2 = 'di12eli23ei2ei22e5:helloei12eli1ei2ei3eee'
test3 = 'di12eli23ei2ei22ee4:johndi12e3:dggee'
print(len(test2))
new_dict = dict_decode(data)
print(new_dict)
Traceback(最近一次调用最后一次): 文件“C:\Users\yewaiyanoo\Desktop\python\torrent\read_torrent.py”,第 8 行,在 数据 = _file.read() 文件“C:\Users\yewaiyanoo\AppData\Local\Programs\Python\Python37-32\lib\codecs.py”,第 701 行,已读取 返回 self.reader.read(size) 文件“C:\Users\yewaiyanoo\AppData\Local\Programs\Python\Python37-32\lib\codecs.py”,第 504 行,正在读取 newchars, decodedbytes = self.decode(data, self.errors) UnicodeDecodeError:“utf-8”编解码器无法解码位置 204 中的字节 0xad:无效的起始字节
【问题讨论】:
标签: python-3.x