【发布时间】:2013-04-04 23:35:38
【问题描述】:
我正在使用 Python Tools for Visual Studio 并阅读一些用意大利语编写的文件。试过 iso-8859-1、iso-8859-2、utf-8、utf-8-sig。 Notepad++ 以不带 BOM 的 UTF-8 格式打开文件。
content = fp.read()
words = content.decode("utf-8-sig").lower().split()
for w in words:
p=''
cur.execute('SELECT word FROM multiwordnet.italian_lemma l, multiwordnet.italian_synset s where l.id = s.id and l.lemma="%s"' % w)
导致崩溃的字符串是C'è。 (读作"c\'\xe3\xa8")
使用 chardet 没有帮助
Traceback (most recent call last):
File "C:\Users\Tathagata\Documents\Visual Studio 2012\Projects\PythonApplicati
on4\PythonApplication4\PythonApplication4.py", line 344, in <module>
createSynsetDict()
File "C:\Users\Tathagata\Documents\Visual Studio 2012\Projects\PythonApplicati
on4\PythonApplication4\PythonApplication4.py", line 294, in createSynsetDict
cur.execute('SELECT word FROM multiwordnet.italian_lemma l, multiwordnet.it
alian_synset s where l.id = s.id and l.lemma="%s"' % w)
File "C:\Python27\lib\site-packages\pymysql\cursors.py", line 117, in execute
self.errorhandler(self, exc, value)
File "C:\Python27\lib\site-packages\pymysql\connections.py", line 187, in defa
ulterrorhandler
raise Error(errorclass, errorvalue)
Error: (<type 'exceptions.UnicodeEncodeError'>, UnicodeEncodeError('ascii', u's\
x00\x00\x00\x03SELECT word FROM multiwordnet.italian_lemma l, multiwordnet.ital
ian_synset s where l.id = s.id and l.lemma="c\'\xe3\xa8"', 116, 118, 'ordinal no
t in range(128)'))
【问题讨论】:
-
您使用的是哪个 DB-API 绑定? (即,哪个数据库驱动程序?)
-
...实际上,更重要的是,您的数据库库模块中的
paramstyle全局值是什么? (如果您不知道,只需识别模块,我们可以查找)。 -
查看@CharlesDuffy -s cmets 的完整代码和更多内容(gist.github.com/tathagata/5320310)
标签: python visual-studio-2010 encoding