【问题标题】:How to use cache filter?如何使用缓存过滤器?
【发布时间】:2021-11-30 17:43:33
【问题描述】:

我的缓存过滤器有问题。

这个想法是不缓存包含"incomplete_result":true的响应

这是我的过滤功能:

import requests
import requests_cache

def phrase_filter(response: requests.models.Response)->bool:
    if '"incomplete_results":true' in response.text:
        return False
    return True

但是当我用这段代码测试它时:

requests_cache.install_cache('demo_cache',expired_after=600,filter_fn=phrase_filter)
requests_cache.clear()

url1 = 'https://raw.githubusercontent.com/KienTrann/requests-cache-testing/main/should_be_cached.txt'
url2 = 'https://raw.githubusercontent.com/KienTrann/requests-cache-testing/main/should_not_be_cached.txt'

with requests_cache.enabled():
    r = requests.get(url1) # First request
    r = requests.get(url1) # Second request
    print(f'Text from url1:\n{r.text}')
    assert r.from_cache==True
    #
    r1 = requests.get(url2) # First request
    r1 = requests.get(url2) # Second request
    print('---')
    print(f'Text from url2:\n{r1.text}')
    assert r1.from_cache==False

requests_cache.disabled()

结果如下:

Text from url1:
abc
xyz
"incomplete_results":false

---
Text from url2:
abc
xyz
"incomplete_results":true

Traceback (most recent call last):
  File "C:\Users\ADMIN\source\repos\LearningPython\py_2\py_2.py", line 25, in <module>
    assert r1.from_cache==False
AssertionError

我不明白为什么 r1 被缓存了。

有什么问题?我该如何解决?

感谢您花时间回答

【问题讨论】:

    标签: python caching browser-cache


    【解决方案1】:

    打补丁

    看起来你快到了! requests_cache.enabled()disabled()install_cache()uninstall_cache() 的上下文管理器替代品。只需将您的设置传递给enabled() 而不是install_cache()

    with requests_cache.enabled('demo_cache', expire_after=600, filter_fn=phrase_filter):
        # ... make requests
    

    这与以下基本相同:

    requests_cache.install_cache('demo_cache', expire_after=600, filter_fn=phrase_filter)
    # ... make requests
    requests_cache.uninstall_cache()
    

    会话

    我个人建议使用requests_cache.CachedSession 而不是修补方法,因为它使缓存的内容更加明确,如果您想发出非缓存请求,您可以使用常规的requests 函数。此处的文档中有更多信息:https://requests-cache.readthedocs.io/en/stable/user_guide/general.html

    例子:

    from requests import Response
    from requests_cache import CachedSession
    
    def phrase_filter(response: Response) -> bool:
        return '"incomplete_results":true' not in response.text
    
    url1 = 'https://raw.githubusercontent.com/KienTrann/requests-cache-testing/main/should_be_cached.txt'
    url2 = 'https://raw.githubusercontent.com/KienTrann/requests-cache-testing/main/should_not_be_cached.txt'
    session = CachedSession('demo_cache', expire_after=600, filter_fn=phrase_filter)
    session.cache.clear()
    
    nonfiltered_response = session.get(url1)
    nonfiltered_response = session.get(url1)
    assert nonfiltered_response.from_cache is True
    
    filtered_response = session.get(url2)
    filtered_response = session.get(url2)
    assert filtered_response.from_cache is False
    

    调试

    如果您以后遇到类似的问题,不确定为什么响应被缓存或没有被缓存,您可以启用调试日志记录:

    import logging
    logging.basicConfig(level='DEBUG')
    

    您将获得每个响应的缓存信息,如下所示:

    DEBUG:requests_cache.session: Pre-cache checks for response from https://raw.githubusercontent.com/KienTrann/requests-cache-testing/main/should_not_be_cached.txt: 
    {
        'disabled cache': False,
        'disabled method': False,
        'disabled status': False,
        'disabled by filter': True,
        'disabled by headers or expiration params': False,
    }
    

    这里的文档中的更多信息:https://requests-cache.readthedocs.io/en/stable/user_guide/troubleshooting.html

    【讨论】:

      【解决方案2】:

      我也试过了,但无法正常工作:

      # Added by Eurico Covas
      # see https://requests-cache.readthedocs.io/en/stable/user_guide/filtering.html
      @staticmethod
      def filter_by_error(response: requests.models.Response) -> bool:
          """Don't cache responses with ErrMsg"""
          if response is None:
             return True
          if response.ok ==False:
             return True
          if len(response.json()['GDSSDKResponse']) == 1:
              if len(response.json()['GDSSDKResponse'][0]) >= 1:
                 if "ErrMsg" in response.json()['GDSSDKResponse'][0].keys():
                    if response.json()['GDSSDKResponse'][0]['ErrMsg'] is not None and response.json()['GDSSDKResponse'][0]['ErrMsg'] != '':
                        return True
          return False
      
      def __init__(self, username, password, verify=True, debug=False, request_caching_enabled=False):
          assert username is not None
          assert password is not None
          assert verify is not None
          assert debug is not None
          assert request_caching_enabled is not None
          self._username = username
          self._password = password
          self._verify = verify
          self._debug = debug
          self._request_caching_enabled=request_caching_enabled
          if self._request_caching_enabled:
              self.request_count = self.get_cached_request_count()
          if not self._verify:
              requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
          if self._debug:
              self.enable_request_debugging()
          else:
              self.enable_error_logging()
          # cache requests for 30*24 hours = 1 month!
          if self._request_caching_enabled:
              requests_cache.install_cache('capiq_cache', backend='sqlite', expire_after=30*86400, allowable_methods=('POST',), filter_fn=self.filter_by_error)
      

      但是

          response = requests.post(self._endpoint, headers=self._headers, data=json.dumps(req),
                                   auth=HTTPBasicAuth(self._username, self._password), verify=self._verify)
      

      永远不会调用 filter_by_error()...

      【讨论】:

      • 这并不能真正回答问题。如果您有其他问题,可以点击 提问。要在此问题有新答案时收到通知,您可以follow this question。一旦你有足够的reputation,你也可以add a bounty 来引起对这个问题的更多关注。 - From Review
      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2013-10-06
      • 2011-03-23
      • 1970-01-01
      • 2013-01-11
      • 2013-02-10
      • 1970-01-01
      相关资源
      最近更新 更多