Python：如何使用生成器来避免 sql 内存问题答案

【问题标题】：Python: How to use a generator to avoid sql memory issuePython：如何使用生成器来避免 sql 内存问题
【发布时间】：2013-08-14 04:01:55
【问题描述】：

我有以下方法可以访问 mysql 数据库，并且查询在我无权更改任何有关增加内存的服务器中执行。我是生成器的新手，并开始阅读有关它的更多信息，并认为我可以将其转换为使用生成器。

def getUNames(self):
    globalUserQuery = ur'''SELECT gu_name FROM globaluser WHERE gu_locked = 0'''
    global_user_list = []
    try:
        self.gdbCursor.execute(globalUserQuery)
        rows = self.gdbCursor.fetchall()
        for row in rows:
            uName = unicode(row['gu_name'], 'utf-8')
            global_user_list.append(uName)
        return global_user_list
    except Exception, e:
        traceback.print_exc()

我使用以下代码：

for user_name in getUNames():
...

这是我从服务器端得到的错误：

^GOut of memory (Needed 725528 bytes)
Traceback (most recent call last):
...
packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
    raise errorclass, errorvalue
OperationalError: (2008, 'MySQL client ran out of memory')

我应该如何使用生成器来避免这种情况：

while true:
   self.gdbCursor.execute(globalUserQuery)
   row = self.gdbCursor.fetchone()
   if row is None: break
   yield row

不确定上述是否是正确的方法，因为我期望我的数据库方法会产生一个列表。我认为最好的是从查询中获取块并返回一个列表，一旦完成，只要查询返回结果，生成器就会给出下一个集合。

【问题讨论】：

标签： python mysql yield

【解决方案1】：

使用 MySQLdb，当调用 cursor.execute(..) 时，默认游标会将整个结果集加载到 Python 列表中。对于可能导致 MemoryError 的大型查询，无论您是否使用生成器。

改为使用 SSCursor 或 SSDictCursor。这些会将结果集保留在服务器端，并允许您在客户端对结果集中的项目进行交互：

import MySQLdb  
import MySQLdb.cursors as cursors
import traceback

def getUNames(self):
    # You may of course want to define `self.gdbCursor` somewhere else...
    conn = MySQLdb.connect(..., cursorclass=cursors.SSCursor)
    #                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    #                       Set the cursor class to SSCursor here
    self.gdbCursor = conn.cursor()

    globalUserQuery = ur'''SELECT gu_name FROM globaluser WHERE gu_locked = 0'''
    try:
        self.gdbCursor.execute(globalUserQuery)
        for row in self.gdbCursor:
            uName = unicode(row['gu_name'], 'utf-8')
            yield uName
    except Exception as e:
        traceback.print_exc()

没有太多关于默认Cursor 和SSCursor 之间区别的文档。我知道的最好的来源是 Cursor Mixin 类本身的文档字符串：

默认光标使用CursorStoreResultMixIn：

In [2]: import MySQLdb.cursors as cursors
In [8]: print(cursors.CursorStoreResultMixIn.__doc__)
This is a MixIn class which causes the entire result set to be
    stored on the client side, i.e. it uses mysql_store_result(). If the
    result set can be very large, consider adding a LIMIT clause to your
    query, or using CursorUseResultMixIn instead.

SSCursor 使用CursorUseResultMixIn:

In [9]: print(cursors.CursorUseResultMixIn.__doc__)
This is a MixIn class which causes the result set to be stored
    in the server and sent row-by-row to client side, i.e. it uses
    mysql_use_result(). You MUST retrieve the entire result set and
    close() the cursor before additional queries can be peformed on
    the connection.

由于我把getUNames改成了生成器，所以会这样使用：

for row in self.getUnames():
    ...

【讨论】：

关于一个鲜为人知的话题的一个很好的答案。我冒昧地编辑了您的代码示例，以强调设置 cursorclass 的部分（老实说，我花了自己几分钟（！）试图找到在哪里设置光标类型 o_O )
@Sylvian Leroux：谢谢美人。
@unutbu 你如何在这里设置行`self.gdbCursor.execute(globalUserQuery) for row in rows:`
对不起；那是我的一个错误。应该是for row in self.gdbCursor。光标是一个迭代器！
我知道这可能就是为什么我离开 fetchall 后仍然遇到内存问题