UnicodeDecodeError Python/Django 应用程序答案

【问题标题】：UnicodeDecodeError Python/Django applicationUnicodeDecodeError Python/Django 应用程序
【发布时间】：2018-08-02 13:40:56
【问题描述】：

我收到了这个错误

UnicodeDecodeError at /select_text 'utf-8' codec can't decode byte 位置 92 中的 0xe7：无效的继续字节请求方法：POST 请求网址：http://agata.pgie.ufrgs.br/select_textDjango 版本：2.0.1 异常类型：UnicodeDecodeError 异常值： “utf-8”编解码器无法解码位置 92 中的字节 0xe7：无效继续字节异常位置：/home/metis/public_html/AGATA/agataenv/lib/python3.4/codecs.py 在解码中，第 319 行 Python 可执行文件：/usr/bin/python3 Python 版本：3.4.3 Python 路径：['/home/metis/public_html/AGATA', '/home/metis/public_html/AGATA/agataenv/lib/python3.4', '/home/metis/public_html/AGATA/agataenv/lib/python3.4/plat-x86_64-linux-gnu', '/home/metis/public_html/AGATA/agataenv/lib/python3.4/lib-dynload', '/usr/lib/python3.4', '/usr/lib/python3.4/plat-x86_64-linux-gnu', '/home/metis/public_html/AGATA/agataenv/lib/python3.4/site-packages'] 服务器时间：Thu, 22 Feb 2018 12:29:51 +0000 Unicode 错误提示无法编码/解码的字符串是：Varia��es 环境：

Request Method: POST
Request URL: http://agata.pgie.ufrgs.br/select_text

Django Version: 2.0.1
Python Version: 3.4.3
Installed Applications:
['django.contrib.admin',
 'django.contrib.auth',
 'django.contrib.contenttypes',
 'django.contrib.sessions',
 'django.contrib.messages',
 'django.contrib.staticfiles',
 'textMining',
 'bootstrapform']
Installed Middleware:
['django.middleware.security.SecurityMiddleware',
 'django.contrib.sessions.middleware.SessionMiddleware',
 'django.middleware.common.CommonMiddleware',
 'django.middleware.csrf.CsrfViewMiddleware',
 'django.contrib.auth.middleware.AuthenticationMiddleware',
 'django.contrib.messages.middleware.MessageMiddleware',
 'django.middleware.clickjacking.XFrameOptionsMiddleware']



Traceback:

File "/home/metis/public_html/AGATA/agataenv/lib/python3.4/site-packages/django/core/handlers/exception.py" in inner
  35.             response = get_response(request)

File "/home/metis/public_html/AGATA/agataenv/lib/python3.4/site-packages/django/core/handlers/base.py" in _get_response
  128.                 response = self.process_exception_by_middleware(e, request)

File "/home/metis/public_html/AGATA/agataenv/lib/python3.4/site-packages/django/core/handlers/base.py" in _get_response
  126.                 response = wrapped_callback(request, *callback_args, **callback_kwargs)

File "/home/metis/public_html/AGATA/textMining/views.py" in select_text
  59.     text_mining = TextMining(file_path, keywords)

File "/home/metis/public_html/AGATA/textMining/TextMining.py" in __init__
  15.         self.separete_file_sentences()

File "/home/metis/public_html/AGATA/textMining/TextMining.py" in separete_file_sentences
  31.             file_text = text_file.read().decode('string-escape').decode("utf-8")

File "/home/metis/public_html/AGATA/agataenv/lib/python3.4/codecs.py" in decode
  319.         (result, consumed) = self._buffer_decode(data, self.errors, final)

Exception Type: UnicodeDecodeError at /select_text
Exception Value: 'utf-8' codec can't decode byte 0xe7 in position 92: invalid continuation byte

在我的 Django 应用程序上，已经在 Apache.. 上，无法弄清楚这里有什么问题，因为我正在处理编码（至少我是这么认为的..）

我的代码（按照顺序）：

def select_text(request):

    book_file = request.FILES['book']
    fs = FileSystemStorage()
    file_name = fs.save(book_file.name, book_file)
    uploaded_file_url = fs.url(file_name)
    print(uploaded_file_url)

    keywords = [
        request.POST['keyword_1'],
        request.POST['keyword_2'],
        request.POST['keyword_3'],
    ]

    blank_optional_keywords = {
        'keyword_2' : False,
        'keyword_3' : False
    }

    if keywords[1] == "":
        blank_optional_keywords['keyword_2'] = True
    if keywords[2] == "":
        blank_optional_keywords['keyword_3'] = True

    request.session["blank_optional_keywords"] = blank_optional_keywords

    #file_name = "LivroMA4_P1_formatado(1).txt"

    #file_path = get_file_path(file_name, 'text')

    file_path = get_file_path(uploaded_file_url, 'upload')

    text_mining = TextMining(file_path, keywords)
    text_mining.get_keywords_sentences()

    sentences = text_mining._keyword_sentences

    sentences_info = generate_sentences_info(sentences)

    request.session["sentences_info"] = sentences_info

    return render(request, 'textMining/select_text.html', {'sentences_info': sentences_info})

TextMining 类函数：

class TextMining(object):
    def __init__(self, file_path, keywords):
        self._file_path = file_path
        self._keywords = keywords
        self._sentences = list()
        self._keyword_sentences = dict()

        self.lower_keywords()
        self.separete_file_sentences()
...
    def separete_file_sentences(self):
        with open(self._file_path, "r", encoding='utf-8') as text_file:
            file_text = text_file.read()
            sentences = nltk.tokenize.sent_tokenize(file_text)

            for i in range(len(sentences)):
                if(len(sentences[i]) > 0):
                    self._sentences.append(sentences[i])

我已经处理了几天了，尝试了很多东西，但没有任何效果..

urls.py（文本挖掘应用）

urlpatterns = [
        url(r'^$', views.index, name='index'),
        url(r'^select_text', views.select_text, name = 'select_text'),
        url(r'^edit_text', views.edit_text, name = 'edit_text'),
        url(r'^generate_aiml', views.generate_aiml, name = 'generate_aiml'),
]

urls.py (TextMiningProject)

urlpatterns = [
    url(r'^admin/', admin.site.urls),
    url(r'^', include('textMining.urls')),
] + static(settings.STATIC_URL, document_root=settings.STATIC_ROOT)

if settings.DEBUG is True:
    urlpatterns += static(settings.MEDIA_URL, document_root=settings.MEDIA_ROOT)

【问题讨论】：

如果可能，请点击切换到复制和粘贴视图并发布适当的堆栈跟踪，以便我们更容易阅读错误消息
@VitorFreitas 完成，抱歉，从未在错误页面 kkk 上看到该选项
在这里粘贴你的 urls.py
@AstikAnand 用它编辑了帖子

标签： python django python-3.x unicode utf-8

【解决方案1】：

我也遇到过同样的问题，奇怪的是，问题的根源是数据库，但 Django 说问题出在位置 122 的模板中我不断从模板中删除行，但 Django 将错误移到下一行!.我在数据库中发现有一个类型为 TINYTEXT 的 UTF8 字段

`md` tinytext COLLATE utf8_unicode_ci DEFAULT NULL COMMENT 'metadata',

我正在使用“插入表选择...”语句从另一个表中复制数据，源表字段类型是 TEXT，当复制数据时，一些长文本在 UTF8 字符中间被截断，这让 Django 感到困惑.解决方法是将字段的数据类型更改为 TEXT，然后再次复制数据。

`md` text COLLATE utf8_unicode_ci DEFAULT NULL COMMENT 'metadata',

【讨论】：