一种将sql转换为json的有效方法答案

【问题标题】：An efficient way to convert sql to json一种将sql转换为json的有效方法
【发布时间】：2014-06-11 06:41:22
【问题描述】：

您好，我有一个大约有 1000000 行的数据库。因为我想使用 mongodb，所以我编写了以下代码将其转换为 json，但这需要很多时间。还有其他方法可以解决这个问题吗？

import psycopg2
import json
con = psycopg2.connect(database)
cur = con.cursor()
sql="select * from mini; "
cur.execute(sql) 
rows=cur.fetchall()
json_string=[]
for sample in rows:
    #print(sample)
    dicti={"label1":sample[0],"label2":sample[1],"label3":sample[2]}
    #print(json.dumps(dicti))
    json_string.append(dicti)
f=open('xyz.txt','w')
print >>f,json_string
f.close()

这里label1,label2,label3是sql的列名，如果有帮助的话。

【问题讨论】：

您是否尝试通过首先评论 rows=cur.fetchall() 之后的所有内容（测量获取部分），然后仅评论 f=open('xyz.txt','w')（获取 + 处理）来查看哪个部分长。反正你会消耗很多内存...
我认为时间主要是由于 cur.fetchall() 因为数据是巨大的

标签： python sql json

【解决方案1】：

您正在为每一行创建一个字典，然后将其转换为字符串。跳过转换并手动创建 json。我用 timeit 模块测试了几种方法：

使用 str.format：

>>> '{{"label1": {0}, "label2": {1}, "label3": {2}}}'.format('1','2','3')
>>> timeit.timeit("""'{{"label1": {0}, "label2": {1}, "label3": {2}}}'.format('1','2','3')""")
1.3898658752441406

将字符串加在一起：

>>> '{"label1": ' + '1' + ', "label2": ' + '2' + ', "label3": ' + '3' + '}'
>>> timeit.timeit("""'{"label1": ' + '1' + ', "label2": ' + '2' + ', "label3": ' + '3' + '}'""")
0.506464958190918

创建字典：

>>> str({"label1": '1', "label2": '2', "label3": '3'})
>>> timeit.timeit(""" str({"label1": '1', "label2": '2', "label3": '3'}) """)
4.776309013366699

还有其他可能如何创建 json。

【讨论】：

【解决方案2】：

使用DictCursor 要简单得多，它将从您的数据库中返回字典：

import psycopg2
import json

cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
rows = cur.execute('SELECT * FROM mini')
with open('xyz.txt', 'w') as f:
    for row in rows:
        f.write('{}\n'.format(json.dumps(row)))

要将整个数据集转储为一个大型 json 对象，请改为：

cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
rows = cur.execute('SELECT * FROM mini')
dataset = list(rows)
with open('xyz.txt', 'w') as f:
   json.dump(dataset, f)

【讨论】：

但是我想要一个我没有通过这种方法得到的字典列表

【解决方案3】：

如果提取部分是花费大部分时间的地方，那可能是因为您需要大量内存并且系统必须进行多次重新分配，或者更糟糕的是交换。您可以尝试使用具有合理大小的 fetchmany 而不是 fetchall，并且（相同）按块写入磁盘。

所以你会有类似的东西：

import psycopg2
import json
con = psycopg2.connect(database)
cur = con.cursor()
sql="select * from mini; "
cur.execute(sql)
size = 256 # find a 'good' size
with open('xyz.txt', 'w') as f:
    while True:
        rows=curs.fetchmany(size)
        if len(rows) == 0:
            break
        json_string=[]
        for sample in rows:
            #print(sample)
            dicti={"label1":sample[0],"label2":sample[1],"label3":sample[2]}
            #print(json.dumps(dicti))
            json_string.append(dicti)
        print >>f,json_string

【讨论】：

嗯，它实际上加倍了时间

【解决方案4】：

这应该可行：

cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
sql = "SELECT row_to_json(row) FROM (select * from mini) row;"
cur.execute(sql)
result = cur.fetchone()
result[0]
# -> [{'col1':'val1', 'col2':'val2', ...}]

【讨论】：