【问题标题】:Fetching data from postgres database in batch (python)批量从postgres数据库中获取数据(python)
【发布时间】:2021-10-06 14:20:37
【问题描述】:

我有以下 Postgres 查询,我从 table1 获取数据,行数约为 2500 万,并希望将以下查询的输出写入多个文件。

query = """ WITH sequence AS (
                SELECT 
                        a,
                        b,
                        c
                FROM table1 )                    

select * from sequence;"""

以下是获取完整数据集的 python 脚本。如何修改脚本以将其提取到多个文件(例如,每个文件有 10000 行)

#IMPORT LIBRARIES ########################
import psycopg2
from pandas import DataFrame

#CREATE DATABASE CONNECTION ########################
connect_str = "dbname='x' user='x' host='x' " "password='x' port = x"
conn = psycopg2.connect(connect_str)
cur = conn.cursor()
conn.autocommit = True

cur.execute(query)
df = DataFrame(cur.fetchall())

谢谢

【问题讨论】:

    标签: python postgresql psycopg2


    【解决方案1】:

    这里有 3 种方法可能会有所帮助

    1. 使用 psycopg2 命名游标 cursor.itersize = 2000

    sn-p

     with conn.cursor(name='fetch_large_result') as cursor:
    
        cursor.itersize = 20000
    
        query = "SELECT * FROM ..."
        cursor.execute(query)
    
        for row in cursor:
    ....
    
    1. 使用 psycopg2 命名游标 fetchmany(size=2000)

    sn-p

    conn = psycopg2.connect(conn_url)
    cursor = conn.cursor(name='fetch_large_result')
    cursor.execute('SELECT * FROM <large_table>')
    
    while True:
        # consume result over a series of iterations
        # with each iteration fetching 2000 records
        records = cursor.fetchmany(size=2000)
    
        if not records:
            break
    
        for r in records:
            ....
    
    cursor.close() #  cleanup
    conn.close()
    

    最后你可以定义一个 SCROLL CURSOR

    1. 定义一个滚动光标

    sn-p

    BEGIN MY_WORK;
    -- Set up a cursor:
    DECLARE scroll_cursor_bd SCROLL CURSOR FOR SELECT * FROM My_Table;
    
    -- Fetch the first 5 rows in the cursor scroll_cursor_bd:
    
    FETCH FORWARD 5 FROM scroll_cursor_bd;
    CLOSE scroll_cursor_bd;
    COMMIT MY_WORK;
    

    请注意不在 psycopg2 中命名光标将导致光标位于客户端而不是服务器端。

    【讨论】:

    • 下面的代码是创建一个包含 2000 行的单个 csv,但是我怎样才能为每 2000 行创建多个 csv 文件,直到表的末尾。 cursor = conn.cursor(name='fetch_big_result') cursor.execute(query) while True: # consume result over a series of iterations # with each iteration fetching 2000 records records = cursor.fetchmany(size=2000) if not records: break for r in records: with open('test.csv', 'wt') as f: csv_writer = csv.writer(f) csv_writer.writerow(r) cursor.close() # cleanup conn.close()
    • 嗨 rshar psycopg2 copy_expert() 对您有帮助吗? stackoverflow.com/a/22789702/1123335
    • 在第一个例子中 (for row in cursor) row 是什么?是单行还是一批多行?
    猜你喜欢
    • 1970-01-01
    • 2021-05-02
    • 2022-11-18
    • 1970-01-01
    • 1970-01-01
    • 2017-06-17
    • 2018-06-07
    • 2019-08-09
    • 1970-01-01
    相关资源
    最近更新 更多