有没有更快的方法从 python 获取大数据帧到 sql server 表？答案

【问题标题】：Is there a faster way to get big dataframe from python to sql server table?有没有更快的方法从 python 获取大数据帧到 sql server 表？
【发布时间】：2021-02-02 10:01:05
【问题描述】：

我试图向 sql server 获取近 13000000 行，但它给出了资源池错误。代码如下：

def execute_sql_file(pconnection, pfilepath, **psqlparams):
    cursor = pconnection.cursor()
    f = open(pfilepath)
    full_sql = f.read()
    for key, value in psqlparams.items():
        full_sql = full_sql.replace('#' + str(key) + '#', str(value))
    f.close()
    return cursor.execute(full_sql)


def write_result_to_db(dfsave, pServer='oltp', pDatabase='warehouse',pSchema = 'dbo',pTable = 
'tb_table'):
    params = urllib.parse.quote_plus(
    "DRIVER={SQL Server Native Client 11.0};SERVER=" + pServer + ";DATABASE=" + pDatabase + ";   
             Trusted_Connection=yes;MARS_Connection=Yes")
    engine = sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
    to_db_connection = engine.connect()

    @sqlalchemy.event.listens_for(engine, "before_cursor_execute")
    def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
        if executemany:
            cursor.fast_executemany = True

    chunknum = math.floor(2100 / dfsave.shape[1]) - 1
    if dfsave.shape[0] > 0:
        dfsave.to_sql(pTable, con=engine, schema=pSchema, if_exists='append', index=False, 
                       method='multi',chunksize=chunknum)
    
    return to_db_connection

这就是函数。在进行数学运算之后，在计算结束时，我得到了 df.当我尝试将它发送到 sql server 时，我使用以下代码：

write_result_to_db(df,pTable='tb_table',pServer='testoltp', pSchema = 'dbo', 
                   pDatabase = 'warehouse')

但它给出了错误。然后我尝试像这样划分df并写入sql表：

bol = int(tb_result.shape[0] / 50)
for start in range(0, tb_result.shape[0], bol):
    write_result_to_db(tb_result.iloc[start:start + 
                       bol],pTable='tb_table',pServer='testoltp', pSchema = 
                       'dbo', pDatabase = 'testwarehouse')
    time.sleep(15)

执行此循环后，它会给出资源池错误。我想我不能这样做。那么如何从 python 获取数据帧到 sql server 表呢？

【问题讨论】：

相关：stackoverflow.com/q/50689082/2144390

标签： python dataframe sqlalchemy

【解决方案1】：

尝试使用 modin 库。你也许可以解决它 https://github.com/modin-project/modin

【讨论】：