【发布时间】:2023-04-04 21:37:01
【问题描述】:
我正在尝试使用 pyodbc 将 pandas 数据帧插入到 MS SQL Server。我之前使用过类似的方法进行直接插入,但是我这次尝试的解决方案非常慢。有没有比我现有的更简化的方式来完成 upsert?
sql_connect = pyodbc.connect('Driver={SQL Server Native Client 11.0}; Server=blank1; Database=blank2; UID=blank3; PWD=blank4')
cursor = sql_connect.cursor()
for index, row in bdf.iterrows():
res = cursor.execute("UPDATE dbo.MPA_BOOK_RAW SET [SITE]=?, [SHIP_TO]=?, [PROD_LINE]=?, [GROUP_NUMBER]=?, [DESCRIPTION]=?, [ORDER_QTY]=?, [BPS_INCLUDE]=? WHERE [CUST]=? AND [ORDER_NUMBER]=? AND [ORDER_DATE]=? AND [PURCHASE_ORDER]=? AND [CHANNEL]=? AND [ITEM]=? AND [END_DT]=?",
row['SITE'],
row['SHIP_TO'],
row['PROD_LINE'],
row['GROUP_NUMBER'],
row['DESCRIPTION'],
row['ORDER_QTY'],
row['BPS_INCLUDE'],
row['CUST'],
row['ORDER_NUMBER'],
row['ORDER_DATE'],
row['PURCHASE_ORDER'],
row['CHANNEL'],
row['ITEM'],
row['END_DT'])
if res.rowcount == 0:
cursor.execute("INSERT INTO dbo.MPA_BOOK_RAW ([SITE], [CUST], [ORDER_NUMBER], [ORDER_DATE], [PURCHASE_ORDER], [CHANNEL], [SHIP_TO], [PROD_LINE], [GROUP_NUMBER], [DESCRIPTION], [ITEM], [ORDER_QTY], [END_DT], [BPS_INCLUDE]) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)",
row['SITE'],
row['CUST'],
row['ORDER_NUMBER'],
row['ORDER_DATE'],
row['PURCHASE_ORDER'],
row['CHANNEL'],
row['SHIP_TO'],
row['PROD_LINE'],
row['GROUP_NUMBER'],
row['DESCRIPTION'],
row['ITEM'],
row['ORDER_QTY'],
row['END_DT'],
row['BPS_INCLUDE'])
sql_connect.commit()
cursor.close()
sql_connect.close()
我用我原来的 ~50k 行数据帧的五行样本尝试了上述方法,它运行良好。所以逻辑似乎没问题。只是速度是个问题。
【问题讨论】:
-
也许你应该尝试合并而不是插入/更新:docs.microsoft.com/en-us/sql/t-sql/statements/…
-
或者也许使用 pandas to_sql: pandas.pydata.org/pandas-docs/stable/reference/api/… 将整个数据帧插入到临时表中。然后从临时表合并到 MPA_BOOK_RAW。
-
正如@vercelli 提到的,避免循环并将数据帧转储到临时临时表中以进行最终更新/插入。注意:对于
to_sql,您需要使用SQLAlchemy connection,而不是此处使用的原始连接。
标签: python sql sql-server pandas pyodbc