【问题标题】:Google App Engine: UnicodeDecode Error in bulk data uploadGoogle App Engine:批量数据上传中的 UnicodeDecodeError
【发布时间】:2010-07-04 01:13:34
【问题描述】:

我在 Windows 上使用 Google App Engine devserver 1.3.5 和 Python 2.5.4 时遇到一个奇怪的错误。

CSV 中的示例行:

EQS,550,foobar,"<some><html><garbage /></html></some>",odp,Ti4=,http://url.com,success

错误:

..................................................................................................................[ERROR   ] [Thread-1] WorkerThread:
Traceback (most recent call last):
  File "C:\Program Files\Google\google_appengine\google\appengine\tools\adaptive_thread_pool.py", line 150, in WorkOnItems
    status, instruction = item.PerformWork(self.__thread_pool)
  File "C:\Program Files\Google\google_appengine\google\appengine\tools\bulkloader.py", line 695, in PerformWork
    transfer_time = self._TransferItem(thread_pool)
  File "C:\Program Files\Google\google_appengine\google\appengine\tools\bulkloader.py", line 852, in _TransferItem
    self.request_manager.PostEntities(self.content)
  File "C:\Program Files\Google\google_appengine\google\appengine\tools\bulkloader.py", line 1296, in PostEntities
    datastore.Put(entities)
  File "C:\Program Files\Google\google_appengine\google\appengine\api\datastore.py", line 282, in Put
    req.entity_list().extend([e._ToPb() for e in entities])
  File "C:\Program Files\Google\google_appengine\google\appengine\api\datastore.py", line 687, in _ToPb
    properties = datastore_types.ToPropertyPb(name, values)
  File "C:\Program Files\Google\google_appengine\google\appengine\api\datastore_types.py", line 1499, in ToPropertyPb
    pbvalue = pack_prop(name, v, pb.mutable_value())
  File "C:\Program Files\Google\google_appengine\google\appengine\api\datastore_types.py", line 1322, in PackString
    pbvalue.set_stringvalue(unicode(value).encode('utf-8'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 36: ordinal not in range(128)
[INFO    ] Unexpected thread death: Thread-1
[INFO    ] An error occurred. Shutting down...
..[ERROR   ] Error in Thread-1: 'ascii' codec can't decode byte 0xe8 in position 36: ordinal not in range(128)

错误是由 base64 字符串的问题产生的吗,每行都有一个?

KGxwMAoobHAxCihTJ0JJT0VFJwpwMgpJMjYxMAp0cDMKYWEu

KGxwMAoobHAxCihTJ01BVEgnCnAyCkkyOTQwCnRwMwphYS4=

数据加载器:

class CourseLoader(bulkloader.Loader):
    def __init__(self):
        bulkloader.Loader.__init__(self, 'Course',
                                   [('dept_code', str),
                                    ('number', int),
                                    ('title', str),
                                    ('full_description', str),
                                    ('unparsed_pre_reqs', str),
                                    ('pickled_pre_reqs', lambda x: base64.b64decode(x)),
                                    ('course_catalog_url', str),
                                    ('parse_succeeded', lambda x: x == 'success')
                                   ])

loaders = [CourseLoader]

有没有办法从 traceback 中判断是哪一行导致了错误?

更新:看起来有两个字符导致错误:è®。如何让 Google App Engine 处理它们?

【问题讨论】:

  • 我会尝试在 GAE 中查找该代码并添加跟踪/日志记录信息。

标签: python google-app-engine


【解决方案1】:

看起来 CSV 的某些行有一些非 ascii 数据(可能是 LATIN SMALL LETTER E WITH GRAVE - 例如,0xe8 在 ISO-8859-1 中的内容)但您将其映射到 @ 987654324@(应该是unicode,我相信CSV应该是utf-8)。

要查找文本文件的任何行是否包含非 ascii 数据,一个简单的 Python sn-p 会有所帮助,例如:

>>> f = open('thefile.csv')
>>> prob = []
>>> for i, line in enumerate(f):
...   try: unicode(line)
...   except: prob.append(i)
...
>>> print 'Problems in %d lines:' % len(prob)
>>> print prob

【讨论】:

  • 看来你是对的。我如何需要不同的数据存储属性来保存这样的值?
  • @Rosarch,数据存储区 StringPropertyTextProperty 可以很好地保存 unicode 对象(后者通过 unicode 的 Text 子类),如 code.google.com/appengine/docs/python/datastore/… 中所述。问题在于您的代码中使用的str - 应该是unicode,并且正确编码了CSV(我相信这里的“正确”编码是utf-8)。不需要“不同的数据存储属性”。
猜你喜欢
  • 2010-10-19
  • 1970-01-01
  • 1970-01-01
  • 2010-10-20
  • 1970-01-01
  • 1970-01-01
  • 2011-08-15
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多