Django中的每个请求缓存？答案

【问题标题】：Per-request cache in Django?Django中的每个请求缓存？
【发布时间】：2010-06-30 16:48:29
【问题描述】：

我想实现一个装饰器，为任何方法提供每个请求的缓存，而不仅仅是视图。这是一个示例用例。

我有一个自定义标签，用于确定是否一长串记录中的一条记录是一个“最爱”。为了检查是否 item 是收藏夹，你要查询数据库。理想情况下，你会执行一个查询以获取所有收藏夹，然后检查针对每条记录的缓存列表。

一种解决方案是获取所有收藏夹在视图中，然后通过设置到模板中，然后进入每个标签调用。

或者，标签本身可以执行查询本身，但只有第一次调用它。然后结果可以被缓存以供后续使用来电。好处是你可以使用这个标签来自任何模板，在任何视图，而不提醒视图。

在现有的缓存机制中，你可以将结果缓存 50 毫秒，并假设这与当前请求。我想做那个相关性可靠。

这是我目前拥有的标签的示例。

@register.filter()
def is_favorite(record, request):

    if "get_favorites" in request.POST:
        favorites = request.POST["get_favorites"]
    else:

        favorites = get_favorites(request.user)

        post = request.POST.copy()
        post["get_favorites"] = favorites
        request.POST = post

    return record in favorites

有没有办法从 Django 获取当前的请求对象，而无需传递它？从标签中，我可以只传递请求，该请求将始终存在。但我想在其他函数中使用这个装饰器。

是否存在按请求缓存的现有实现？

【问题讨论】：

标签： django django-cache

【解决方案1】：

使用自定义中间件，您可以获得保证为每个请求清除的 Django 缓存实例。

这是我在一个项目中使用的：

from threading import currentThread
from django.core.cache.backends.locmem import LocMemCache

_request_cache = {}
_installed_middleware = False

def get_request_cache():
    assert _installed_middleware, 'RequestCacheMiddleware not loaded'
    return _request_cache[currentThread()]

# LocMemCache is a threadsafe local memory cache
class RequestCache(LocMemCache):
    def __init__(self):
        name = 'locmemcache@%i' % hash(currentThread())
        params = dict()
        super(RequestCache, self).__init__(name, params)

class RequestCacheMiddleware(object):
    def __init__(self):
        global _installed_middleware
        _installed_middleware = True

    def process_request(self, request):
        cache = _request_cache.get(currentThread()) or RequestCache()
        _request_cache[currentThread()] = cache

        cache.clear()

要使用中间件，请在 settings.py 中注册，例如：

MIDDLEWARE_CLASSES = (
    ...
    'myapp.request_cache.RequestCacheMiddleware'
)

然后您可以按如下方式使用缓存：

from myapp.request_cache import get_request_cache

cache = get_request_cache()

更多信息请参考 django 低级缓存 api 文档：

Django Low-Level Cache API

修改 memoize 装饰器以使用请求缓存应该很容易。查看 Python 装饰器库，了解 memoize 装饰器的一个很好的示例：

Python Decorator Library

【讨论】：

警惕这个解决方案！ _request_cache 字典会随着越来越多的线程被打开来服务你的用户而不断填满，而且它永远不会被清除。根据您的网络服务器存储 Python 全局变量的方式，这可能会导致内存泄漏。
是的 - 清除 process_response 和 process_expception 上的缓存 - 在 django cuser 中间件插件中有一个非常好的例子。见：github.com/Alir3z4/django-cuser/blob/master/cuser/middleware.py

【解决方案2】：

编辑：

我想出的最终解决方案已经编译成 PyPI 包：https://pypi.org/project/django-request-cache/

编辑 2016-06-15：

我发现了一个非常简单的解决方案来解决这个问题，并且有点担心从一开始就没有意识到这应该是多么容易。

from django.core.cache.backends.base import BaseCache
from django.core.cache.backends.locmem import LocMemCache
from django.utils.synch import RWLock


class RequestCache(LocMemCache):
    """
    RequestCache is a customized LocMemCache which stores its data cache as an instance attribute, rather than
    a global. It's designed to live only as long as the request object that RequestCacheMiddleware attaches it to.
    """

    def __init__(self):
        # We explicitly do not call super() here, because while we want BaseCache.__init__() to run, we *don't*
        # want LocMemCache.__init__() to run, because that would store our caches in its globals.
        BaseCache.__init__(self, {})

        self._cache = {}
        self._expire_info = {}
        self._lock = RWLock()

class RequestCacheMiddleware(object):
    """
    Creates a fresh cache instance as request.cache. The cache instance lives only as long as request does.
    """

    def process_request(self, request):
        request.cache = RequestCache()

这样，您可以将request.cache 用作缓存实例，它的生命周期与request 一样长，并且会在请求完成时被垃圾收集器完全清理。

如果您需要从通常不可用的上下文中访问 request 对象，您可以使用可在线找到的所谓“全局请求中间件”的各种实现之一。

** 初步答案：**

这里没有其他解决方案可以解决的一个主要问题是，当您在单个进程的生命周期中创建和销毁其中的几个时，LocMemCache 会泄漏内存。 django.core.cache.backends.locmem 定义了几个全局字典，这些字典包含对每个 LocalMemCache 实例的缓存数据的引用，并且这些字典永远不会被清空。

下面的代码解决了这个问题。它最初是@href_ 的答案和@squarelogic.hayden 评论中链接的代码所使用的更简洁逻辑的组合，然后我进一步完善了它。

from uuid import uuid4
from threading import current_thread

from django.core.cache.backends.base import BaseCache
from django.core.cache.backends.locmem import LocMemCache
from django.utils.synch import RWLock


# Global in-memory store of cache data. Keyed by name, to provides multiple
# named local memory caches.
_caches = {}
_expire_info = {}
_locks = {}


class RequestCache(LocMemCache):
    """
    RequestCache is a customized LocMemCache with a destructor, ensuring that creating
    and destroying RequestCache objects over and over doesn't leak memory.
    """

    def __init__(self):
        # We explicitly do not call super() here, because while we want
        # BaseCache.__init__() to run, we *don't* want LocMemCache.__init__() to run.
        BaseCache.__init__(self, {})

        # Use a name that is guaranteed to be unique for each RequestCache instance.
        # This ensures that it will always be safe to call del _caches[self.name] in
        # the destructor, even when multiple threads are doing so at the same time.
        self.name = uuid4()
        self._cache = _caches.setdefault(self.name, {})
        self._expire_info = _expire_info.setdefault(self.name, {})
        self._lock = _locks.setdefault(self.name, RWLock())

    def __del__(self):
        del _caches[self.name]
        del _expire_info[self.name]
        del _locks[self.name]


class RequestCacheMiddleware(object):
    """
    Creates a cache instance that persists only for the duration of the current request.
    """

    _request_caches = {}

    def process_request(self, request):
        # The RequestCache object is keyed on the current thread because each request is
        # processed on a single thread, allowing us to retrieve the correct RequestCache
        # object in the other functions.
        self._request_caches[current_thread()] = RequestCache()

    def process_response(self, request, response):
        self.delete_cache()
        return response

    def process_exception(self, request, exception):
        self.delete_cache()

    @classmethod
    def get_cache(cls):
        """
        Retrieve the current request's cache.

        Returns None if RequestCacheMiddleware is not currently installed via 
        MIDDLEWARE_CLASSES, or if there is no active request.
        """
        return cls._request_caches.get(current_thread())

    @classmethod
    def clear_cache(cls):
        """
        Clear the current request's cache.
        """
        cache = cls.get_cache()
        if cache:
            cache.clear()

    @classmethod
    def delete_cache(cls):
        """
        Delete the current request's cache object to avoid leaking memory.
        """
        cache = cls._request_caches.pop(current_thread(), None)
        del cache

编辑 2016-06-15：我发现了一个非常简单的解决方案来解决这个问题，并且因为从一开始就没有意识到这应该是多么容易而有点面无表情。

from django.core.cache.backends.base import BaseCache
from django.core.cache.backends.locmem import LocMemCache
from django.utils.synch import RWLock


class RequestCache(LocMemCache):
    """
    RequestCache is a customized LocMemCache which stores its data cache as an instance attribute, rather than
    a global. It's designed to live only as long as the request object that RequestCacheMiddleware attaches it to.
    """

    def __init__(self):
        # We explicitly do not call super() here, because while we want BaseCache.__init__() to run, we *don't*
        # want LocMemCache.__init__() to run, because that would store our caches in its globals.
        BaseCache.__init__(self, {})

        self._cache = {}
        self._expire_info = {}
        self._lock = RWLock()

class RequestCacheMiddleware(object):
    """
    Creates a fresh cache instance as request.cache. The cache instance lives only as long as request does.
    """

    def process_request(self, request):
        request.cache = RequestCache()

这样，您可以将request.cache 用作缓存实例，该实例的生命周期与request 一样长，并且在请求完成时将被垃圾收集器完全清除。

如果您需要从通常不可用的上下文中访问 request 对象，您可以使用可在线找到的所谓“全局请求中间件”的各种实现之一。

【讨论】：

能不能先放2016的解决方案？
这个答案被编译成一个包?github.com/anx-ckreuzberger/django-request-cache
"django.core.cache.backends.locmem 定义了几个全局字典，这些字典包含对每个 LocalMemCache 实例的缓存数据的引用，并且这些字典永远不会被清空。"如果属实，这不是 Django 的 LocalMemCache 本身的内存泄漏吗？

【解决方案3】：

我想出了一个技巧，可以将内容直接缓存到请求对象中（而不是使用标准缓存，它将绑定到 memcached、文件、数据库等）

# get the request object's dictionary (rather one of its methods' dictionary)
mycache = request.get_host.__dict__

# check whether we already have our value cached and return it
if mycache.get( 'c_category', False ):
    return mycache['c_category']
else:
    # get some object from the database (a category object in this case)
    c = Category.objects.get( id = cid )

    # cache the database object into a new key in the request object
    mycache['c_category'] = c

    return c

所以，基本上我只是将缓存值（在这种情况下为类别对象）存储在请求字典中的新键“c_category”下。或者更准确地说，因为我们不能只在请求对象上创建密钥，所以我将密钥添加到请求对象的方法之一 - get_host()。

乔治。

【讨论】：

【解决方案4】：

多年后，在单个 Django 请求中缓存 SELECT 语句的超级技巧。您需要从一开始就在请求范围内执行patch() 方法，就像在一个中间件中一样。

from threading import local
import itertools
from django.db.models.sql.constants import MULTI
from django.db.models.sql.compiler import SQLCompiler
from django.db.models.sql.datastructures import EmptyResultSet
from django.db.models.sql.constants import GET_ITERATOR_CHUNK_SIZE


_thread_locals = local()


def get_sql(compiler):
    ''' get a tuple of the SQL query and the arguments '''
    try:
        return compiler.as_sql()
    except EmptyResultSet:
        pass
    return ('', [])


def execute_sql_cache(self, result_type=MULTI):

    if hasattr(_thread_locals, 'query_cache'):

        sql = get_sql(self)  # ('SELECT * FROM ...', (50)) <= sql string, args tuple
        if sql[0][:6].upper() == 'SELECT':

            # uses the tuple of sql + args as the cache key
            if sql in _thread_locals.query_cache:
                return _thread_locals.query_cache[sql]

            result = self._execute_sql(result_type)
            if hasattr(result, 'next'):

                # only cache if this is not a full first page of a chunked set
                peek = result.next()
                result = list(itertools.chain([peek], result))

                if len(peek) == GET_ITERATOR_CHUNK_SIZE:
                    return result

            _thread_locals.query_cache[sql] = result

            return result

        else:
            # the database has been updated; throw away the cache
            _thread_locals.query_cache = {}

    return self._execute_sql(result_type)


def patch():
    ''' patch the django query runner to use our own method to execute sql '''
    _thread_locals.query_cache = {}
    if not hasattr(SQLCompiler, '_execute_sql'):
        SQLCompiler._execute_sql = SQLCompiler.execute_sql
        SQLCompiler.execute_sql = execute_sql_cache

patch() 方法将 Django 内部的 execute_sql 方法替换为一个名为 execute_sql_cache 的替代方法。该方法查看要运行的 sql，如果是 select 语句，它首先检查线程本地缓存。只有在缓存中没有找到它时，它才会继续执行 SQL。在任何其他类型的 sql 语句上，它都会清除缓存。有一些逻辑不缓存大型结果集，这意味着超过 100 条记录。这是为了保留 Django 的惰性查询集评估。

【讨论】：

乍一看，这似乎很酷。但是看着它，当您没有得到 SELECT 语句时，您会使缓存无效。似乎合理，直到您有多个进程。它不会使所有缓存失效。一个小的变化是将其存储在请求对象上，以便在请求之间重置。

【解决方案5】：

这个使用 python dict 作为缓存（不是 django 的缓存），非常简单和轻量级。

每当线程被销毁时，它的缓存就会过于自动。
不需要任何中间件，每次访问都不会对内容进行pickle和depickle，速度更快。
经过测试并与 gevent 的猴子补丁一起使用。

同样可以用线程本地存储来实现。我不知道这种方法的任何缺点，请随时将它们添加到 cmets 中。

from threading import currentThread
import weakref

_request_cache = weakref.WeakKeyDictionary()

def get_request_cache():
    return _request_cache.setdefault(currentThread(), {})

【讨论】：

最初的问题是关于每请求缓存而不是每线程缓存。在基于线程池的服务器中，您的实现永远不会过期并导致内存耗尽。

【解决方案6】：

您始终可以手动进行缓存。

    ...
    if "get_favorites" in request.POST:
        favorites = request.POST["get_favorites"]
    else:
        from django.core.cache import cache

        favorites = cache.get(request.user.username)
        if not favorites:
            favorites = get_favorites(request.user)
            cache.set(request.user.username, favorites, seconds)
    ...

【讨论】：

问题是关于每个请求的缓存。该解决方案也将缓存用于用户的第二个请求。但我认为大多数时候 cache.get() 和 cache.set() 还是更好的。

【解决方案7】：

@href_ 给出的Answer 很棒。

以防万一你想要一些更短的东西，也可能起到作用：

from django.utils.lru_cache import lru_cache

def cached_call(func, *args, **kwargs):
    """Very basic temporary cache, will cache results
    for average of 1.5 sec and no more then 3 sec"""
    return _cached_call(int(time.time() / 3), func, *args, **kwargs)


@lru_cache(maxsize=100)
def _cached_call(time, func, *args, **kwargs):
    return func(*args, **kwargs)

然后让收藏夹这样称呼它：

favourites = cached_call(get_favourites, request.user)

此方法利用lru cache 并将其与时间戳相结合，我们确保缓存不会在几秒钟内保存任何内容。如果您需要在短时间内多次调用代价高昂的函数，这可以解决问题。

这不是使缓存失效的完美方法，因为它偶尔会丢失最近的数据：int(..2.99.. / 3)，然后是int(..3.00..) / 3)。尽管有这个缺点，但它在大多数点击中仍然非常有效。

此外，您还可以在请求/响应周期之外使用它，例如 celery 任务或管理命令作业。

【讨论】：