使用 Python 3 查询 mongodb 集合的最佳方法是什么答案

【问题标题】：What is the best way to query a mongodb collection using Python 3使用 Python 3 查询 mongodb 集合的最佳方法是什么
【发布时间】：2015-10-09 14:58:34
【问题描述】：

首先，让我在这里解释一下项目问题：

我正在开发一个 Web 应用程序，使用库 CherryPy、PyMongo 并且数据库后端是 MongoDB 数据库，我正在使用Python 3 作为开发语言。

我的数据库集合包含 260.640 个文档，这里简化为格式：

{"_id":1,"time":"2014-01-01 00:00:00","value":"1.37468"}

集合中的所有文档都有一个 id（从 0 到 260640）和一个每增加一分钟的时间（所以我总共有 6 个月的数据）。

我在 Windows 控制台中运行 Mongod.exe 服务器，在另一个 Windows 控制台中运行我的 python Web 服务器，并使用 Google Chrome 浏览网页。

我的目标：

我想查询数据库集合，所以我可以获得一个 HTML 表，其中包含两个日期之间的行，例如：2014-01-01 00:00:00 和 2014-01-10 00:00:00，以及然后应该在 CherryPy 生成的网页上查看该表。

我的问题：

使用问题中提供的代码，我可以查询数据库并在网页上显示表格，但是显示大约 7200 行大约需要 30-50 秒，这只是大约 5 天的数据，当我需要显示10天甚至一个月的数据时，我们谈论更长的等待时间，问题首先是用户必须等待，而且如果用户选择更长的时间跨度，浏览器可能会超时，这会杀死应用程序。

我的慢代码：

这是当前有效的代码，但仅作为“标准汽车”，我需要“超级汽车”。

def onem(self):
    # Get the MongoClient from the PyMongo lib.
    client = MongoClient()
    # our database name is raw_data
    db = client.raw_data
    # Get the starting date from the database (returns a valid date string).
    date_from = self.getFromDateToDatePickerFromDB();
    # Get the end date from the database (returns a valid date string).
    date_to = self.getToDateToDatePicker();
    # Query the database for the collection of documents.
    collection = db.data.find({'time' : {'$gte' : date_from, '$lt' : date_to}})
    # Define a variable to hold the temp rows.
    html = ""
    # for each document in our collection.
    for document in collection:
        # build the table.
        html = html + '''
                        <tr>
                            <td>''' + str(int(document['_id'])) + '''</td>
                            <td>''' + str(document['time']) + '''</td>
                            <td>''' + str(document['value']) + '''</td>
                        </tr>
                    '''
    table = '''
            <table border="1" celspacing="0" cellpadding="0" width="100%">
                <thead>
                    <tr>
                        <td>ID</td>
                        <td>TIME</td>
                        <td>VALUE</td>
                    </tr>
                </thead>
                <tbody>
                    ''' + html + '''
                </tbody>
            </table>
            '''
    # return the valid html with the table to a function which
    # outputs an html template to the browser.
    return self.GetStaticHTML(table)
# Tell CherryPy that we are done working on the webpage and it lets us show it to the browser.
onem.exposed = True

如果您知道比提供的代码更好的查询 mongodb 数据库的方法：

collection = db.data.find({'time' : {'$gte' : date_from, '$lt' : date_to}})

或者，如果您知道加速数据库、代码或其他任何东西的方法，那么我真的很想听听。

谢谢，

【问题讨论】：

标签： python mongodb indexing pymongo cherrypy

【解决方案1】：

可能有两个弱点会导致您的代码运行缓慢且不可扩展：

您在 mongo 集合中有时间属性的索引吗？如果没有，请创建该索引，这是一次性操作。
无论您需要返回多少个项目，您都无法返回与搜索匹配的所有项目。您必须使用分页，即只返回固定数量的项目，例如200 并提供前 200 项和后 200 项的链接。

【讨论】：

1.你认为这样做我能赢多少速度？ - 我会马上看，我应该索引文档中的所有字段还是只索引时间字段以获得最佳性能？ 2. 我得到的任务是“在一张桌子上”，但也许我们应该使用分页。谢谢你的回复:)
如果查询确实是热点，您只能通过分析代码来确定我们的热点，这可能是一个显着的改进。搜索可能从 O(N) 到 O(log N)。通常，您会为要查询的字段编制索引。结帐docs.mongodb.org/manual/applications/indexes