对在 Python App Engine 上使用数据存储进行分页感到困惑答案

【问题标题】：Confused about using datastore for pagination on Python App Engine对在 Python App Engine 上使用数据存储进行分页感到困惑
【发布时间】：2016-09-30 21:54:53
【问题描述】：

我正在构建一个 Webapp2 应用程序，并试图找到最佳的分页解决方案。我发现流行的方法是使用 cursor。例如：

# My solution is to get all cursors in the very first time
# For example, there will be 2 cursors for 3 pages 
# page1|c1|page2|c2|page3

page_size = 20
all = model.MyModel.gql(...)
...
if cursor:
   # Use cursor to get items
   list = all.with_curosr(...)
else:
   # Get all cursors and memcaching all cursors
   ...

我也尝试了另一种解决方案，尽管我知道很多人会认为这是一个糟糕的解决方案：

# In this solution, I try to split query into many list
# page1(list1)|page2(list2)|page3(list3)

page_size = 20
all = list(model.MyModel.gql(...))
lists = [all[i:i+page_size] for i in range(0, len(all), page_size)]

# Client will send the page number to server side
list = []
if len(lists) > 0:
    list = lists[int(page_number)-1]

我的问题来了！使用游标有什么好处？

两种方案都需要执行MyModel.gql(...)获取数据，第一种方案还是要执行with_cursor(...)检索项目。这让我很困惑。

如果您有更好的解决方案或改进我的解决方案的任何建议，请与我分享！非常感谢！

【问题讨论】：

标签： python google-app-engine pagination google-cloud-datastore

【解决方案1】：

使用游标和页面有很大的不同。使用游标时，获得下一个游标结果非常有效：最多 O(1) 或 O(log n)。

使用分页，整个数据存储查询结果需要扫描到您请求的页面：每个页面请求 O(n)。

因此，像使用游标迭代所有页面这样简单的事情最多为 O(n log n)，而使用页面则为 O(n^2)。这不仅需要更多时间，而且还需要更多的数据存储读取，因为在内部 Google 仍会读取所有条目直到请求的页面，然后将它们过滤掉。

因此，如果您有很多请求并且更有可能启动另一个实例，它还会产生更多的数据存储读取成本，并且还会产生更多的实例小时数。

查看官方文档：https://cloud.google.com/appengine/docs/python/datastore/queries#Python_Offsets_versus_cursors

【讨论】：

感谢您的详细说明。现在我意识到使用游标可以减少读取操作而不是使用偏移量。