【发布时间】:2015-10-08 14:18:16
【问题描述】:
我有一个大型 Pandas 数据框(超过 200 万行),包含以下列:
Id,CandidateRegistrationID,CandidateID,OurReference,QualificationCode,ExamCode,ExamDate,QualificationName,DataSource,QuestionNo,CandidateResponse,CorrectAnswerChoice,UniquePaperNo,QuestionCode
我有一个函数可以将数据帧写入 sqlite:
def writeDF(df,db,table):
conn = sqlite3.connect(db)
conn.text_factory = str # allows utf-8 data to be stored
df.to_sql(table, conn, flavor='sqlite', schema=None, if_exists='replace', index=False, index_label=None, chunksize=None, dtype=None)
conn.close()
在数据的缩减版本上,这可以正常工作。在完整的数据集上,我收到以下错误:
ValueError: Cannot convert identifier to UTF-8: 'Id'
Id 字段只是一个整数。
我欢迎任何见解。谷歌搜索只会将我带到 Pandas 中出错的那一行。
Traceback (most recent call last):
File "/py-csv-jmetrik/venv/lib/python2.7/site-packages/flask/app.py", line 1836, in __call__
return self.wsgi_app(environ, start_response)
File "/py-csv-jmetrik/venv/lib/python2.7/site-packages/flask/app.py", line 1820, in wsgi_app
response = self.make_response(self.handle_exception(e))
File "/py-csv-jmetrik/venv/lib/python2.7/site-packages/flask/app.py", line 1403, in handle_exception
reraise(exc_type, exc_value, tb)
File "/py-csv-jmetrik/venv/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app
response = self.full_dispatch_request()
File "/py-csv-jmetrik/venv/lib/python2.7/site-packages/flask/app.py", line 1477, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/py-csv-jmetrik/venv/lib/python2.7/site-packages/flask/app.py", line 1381, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/py-csv-jmetrik/venv/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request
rv = self.dispatch_request()
File "/py-csv-jmetrik/venv/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/py-csv-jmetrik/app/routes.py", line 69, in index
writeDF(data_df,db,table)
File "/py-csv-jmetrik/app/routes.py", line 27, in writeDF
df.to_sql(table, conn, flavor='sqlite', schema=None, if_exists='replace', index=True, index_label=None, chunksize=None, dtype=None)
File "/py-csv-jmetrik/venv/lib/python2.7/site-packages/pandas/core/generic.py", line 982, in to_sql
dtype=dtype)
File "/py-csv-jmetrik/venv/lib/python2.7/site-packages/pandas/io/sql.py", line 549, in to_sql
chunksize=chunksize, dtype=dtype)
File "/py-csv-jmetrik/venv/lib/python2.7/site-packages/pandas/io/sql.py", line 1565, in to_sql
dtype=dtype)
File "/py-csv-jmetrik/venv/lib/python2.7/site-packages/pandas/io/sql.py", line 627, in __init__
self.table = self._create_table_setup()
File "/py-csv-jmetrik/venv/lib/python2.7/site-packages/pandas/io/sql.py", line 1377, in _create_table_setup
for cname, ctype, _ in column_names_and_types]
File "/py-csv-jmetrik/venv/lib/python2.7/site-packages/pandas/io/sql.py", line 1297, in _get_valid_sqlite_name
uname = _get_unicode_name(name)
File "/py-csv-jmetrik/venv/lib/python2.7/site-packages/pandas/io/sql.py", line 1271, in _get_unicode_name
raise ValueError("Cannot convert identifier to UTF-8: '%s'" % name)
【问题讨论】:
-
stackoverflow.com/questions/3425320/… 问题看起来与此类似,但我不认为它是重复的。但它的答案(使用
sqlite3.Binary作为text_factory)可能与OP 的问题有关。值得一看。 -
@rmunn 我不确定,因为stackoverflow.com/questions/3425320/… 指的是存储 blob。我所做的只是存储文本字符串和数字。
-
不,你是对的,这不是你遇到的问题。您遇到的情况是 column name “Id”无法转换为 Unicode。这很奇怪:当我查看
pandas/io/sql.py的源代码时,我看不出它应该失败的任何原因。