UnicodeDecodeError：“utf-8”编解码器无法解码位置 1 的字节 0x92：无效的起始字节答案

【问题标题】：UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 1: invalid start byteUnicodeDecodeError：“utf-8”编解码器无法解码位置 1 的字节 0x92：无效的起始字节
【发布时间】：2018-05-07 16:17:43
【问题描述】：

我在这个 github 之后构建了聊天机器人： https://github.com/llSourcell/tensorflow_chatbot

我还获得了以下数据：https://github.com/suriyadeepan/easy_seq2seq/tree/master/data

我使用 tensorflow 0.12 和 python 3.5。谁能帮我解决这个问题：>> Mode : test

回溯（最近一次通话最后）：文件“execute.py”，第 324 行，在解码（）解码中的文件“execute.py”，第 220 行 enc_vocab，_ = data_utils.initialize_vocabulary(enc_vocab_path) 文件“D:\My_document\AI\Chatbot_Conversation\tensorflow_chatbot-master\data_utils.py”，第 86 行，在 initialize_vocabulary rev_vocab.extend(f.readlines()) 文件“C:\Users\Hoang\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\lib\io\file_io.py”，第 131 行，在 readlines s = self.readline() 文件“C:\Users\Hoang\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\lib\io\file_io.py”，第 124 行，在 readline 返回 compat.as_str_any(self._read_buf.ReadLineAsString()) 文件“C:\Users\Hoang\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\util\compat.py”，第 106 行，在 as_str_any 返回 as_str(值) 文件“C:\Users\Hoang\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\util\compat.py”，第 84 行，在 as_text 返回 bytes_or_text.decode（编码） UnicodeDecodeError：“utf-8”编解码器无法解码位置 1 的字节 0x92：无效的起始字节

这个错误意味着我需要将我的数据修改为utf-8？我很感激任何帮助。谢谢！

【问题讨论】：

似乎有些字符不在 ASCII 范围内。您需要将文本转换为 unicode。
你是说我的数据？ stackoverflow.com/users/5281012/shivam-jindal
是的，在数据中你有一些你正在转换成字符串的非ASCII字符。

标签： python machine-learning deep-learning artificial-intelligence

【解决方案1】：

请尝试使用 encoding='unicode_escape' 读取数据。例如

df= pd.read_csv('file_name.csv',encoding ='unicode_escape')

它对我有用。

【讨论】：