确保查询集被缓存答案

【问题标题】：Ensure queryset is cached确保查询集被缓存
【发布时间】：2018-01-18 10:45:17
【问题描述】：

我正在调用一个函数来启动一个需要更长执行时间的进程，完成了许多不同的事情。该函数主要处理特定类的实例，Item。这些项目按不同的属性分类：category1、category2 和 category3。

现在，有一个不同的模型将某种规则应用于这些类别：Rule，具有多对多属性：categories1、categories2 和 categories3。一条规则适用于Item，如果相同的规则指向不同的类别，则只应应用其中一个。由封装在函数中的某种逻辑来决定哪一个：

class Rule(models.Model):
    warehouse = models.ForeignKey('Warehouse')
    categories1 = models.ManyToManyField('Category1')
    categories2 = models.ManyToManyField('Category2')
    categories3 = models.ManyToManyField('Category3')

    @staticmethod
    def get_rules_that_applies(item):
        rules = warehouse.rule_set.all()
        if not rules.exists():
            return None
        # ... determine which rule applies to the item by filtering, etc.
        return rule

问题在于get_rules_that_applies 方法。每次我们需要获取适用于某个项目的规则时，我再说一遍，我们正在谈论的过程中涉及到许多项目，warehouse.rule_set.all() 被调用。

由于这个过程中规则不会发生变化，所以我们可以把所有的规则都缓存在仓库里，但是怎么做呢？如何确保 warehouse = warehouse.rule_set.all() 被缓存并且所有对这些规则起作用的过滤和 QuerySet 操作都不会命中数据库？

【问题讨论】：

get_rules_that_applies 中的仓库是什么？
@DimaKudosh 是的，我应该提到它：warehouse 就像所有过程发生的“主/上下文”对象。应用程序中的所有内容都围绕单个 warehouse 实例。

标签： python django caching

【解决方案1】：

我相信您正在寻求的解决方案是memoization 的get_rules_that_applies 方法。

有一个现成的工具，称为django-memoize，those 是它的文档。

使用快速入门：

pip install django-memoize

把它放在你的INSTALLED_APPS

INSTALLED_APPS = [
    '...',
    'memoize',
]

在你的model.py:

from memoize import memoize

class Rule(models.Model):
    warehouse = models.ForeignKey('Warehouse')
    categories1 = models.ManyToManyField('Category1')
    categories2 = models.ManyToManyField('Category2')
    categories3 = models.ManyToManyField('Category3')

    @staticmethod
    @memoize(timeout=something_reasonable_in_seconds)
    def get_rules_that_applies(item):
        rules = warehouse.rule_set.all()
        if not rules.exists():
            return None
           # ... determine which rule applies to the item by filtering, etc.
        return rules

（更新）半DIY方法：

由于我的回答，我阅读了以下帖子：https://www.peterbe.com/plog/cache_memoize-cache-decorator-for-django，其中附有gist，说明如何自己实现记忆。

更 DIY 的方法：

Python 3.2 及更高版本：

@functools.lru_cache 装饰器是：

装饰器用一个可保存最多 maxsize 最近调用的记忆可调用函数来包装函数。当使用相同的参数定期调用昂贵的或 I/O 绑定的函数时，它可以节省时间。

使用方法：

from functools import lru_cache


class Rule(models.Model):
    ...

    @lru_cache(maxsize=a_reasonable_integer_size_of_cache)
    def get_rules_that_applies(item):
        rules = warehouse.rule_set.all()
        if not rules.exists():
            return None
            # ... determine which rule applies to the item by filtering, etc.
        return rules

maxsize：定义函数调用中要存储的缓存大小。可以设置为None缓存每次调用。

Python

在这里What is memoization and how can I use it in Python? 存在一种更“老派”的方法。

如何使用上述任一方法缓存查询集：

为什么不定义一个中间函数来形成查询集并缓存函数结果？

@lru_cache(maxsize=None)

or 

@memoize()
def middle_function():
    return warehouse.rule_set.all()

然后在您的get_rules_that_applies 函数中：

def get_rules_that_applies(item):
    rules = middle_function()

【讨论】：

我并不是真的在缓存超时，它可以而且应该在进程执行的整个生命周期中兑现（当warehouse 实例完成时，您可以轻松判断它何时完成从内存中释放出来）。另外，这会缓存该项目的规则吗？（因为它正在缓存函数结果）我不能强制缓存一个查询集吗？
@dabadaba 我对如何缓存查询集进行了编辑，就像我想象的那样:) 顺便说一句，memoize 我不知道，但lru_cache 似乎没有超时。
我刚刚意识到我不能使用django-memoize，因为我的 Django 版本较旧。
@dabadaba 您仍然可以使用@lru_cache 或创建自己的，如下所示：stackoverflow.com/questions/1988804/…
使用 Python 2.7 所以我想我不能

【解决方案2】：

你有两个选择：

缓存视图中的项目
cahce 模型中的项目

视图和模型中的代码将相同，导入cahce：

from django.core.cache import cache

代码：

if cache.get('query_result') is not None:
    return cache.get('query_result')
else:
    cache.set('query_result', result, 3600)
    #cache.set('cache_name', 'your query', 'expiry time')
    return rule

您的模型将是：

class Rule(models.Model):
warehouse = models.ForeignKey('Warehouse')
categories1 = models.ManyToManyField('Category1')
categories2 = models.ManyToManyField('Category2')
categories3 = models.ManyToManyField('Category3')

@staticmethod
def get_rules_that_applies(item):
    rules = warehouse.rule_set.all()
    if not rules.exists():
        return None
    # ... determine which rule applies to the item by filtering, etc.
    if cache.get('query_result') is not None:
        return cache.get('query_result')
    else:
        cache.set('query_result', result, 3600)
        #cache.set('cache_name', 'your query', 'expiry time')
        return rule

    return rule

关于 Django 查询的信息很少，何时评估？：

https://docs.djangoproject.com/en/1.11/ref/models/querysets/#when-querysets-are-evaluated

希望有帮助

【讨论】：

这将缓存适用于项目的规则，而不是规则的源查询集