如何重试特定异常的请求答案

【问题标题】：How to retry requests on specific exception如何重试特定异常的请求
【发布时间】：2021-10-29 23:16:44
【问题描述】：

我目前一直在使用我自己的“重试”功能，我想重试直到请求有效。在某些情况下，如果我达到任何 5xx 状态，我应该在很长的延迟后重试。

如果我点击特定的状态码，例如200 或 404，它不应该引发状态码，否则引发它。

所以我做了这样的事情：

import time

import requests
from bs4 import BeautifulSoup
from requests import (
    RequestException,
    Timeout
)


def do_request():
    try:
        # There is some scenarios where I would use my own proxies by doing
        # requests.get("https://www.bbc.com/", timeout=0.1, proxies={'https': 'xxx.xxxx.xxx.xx'))
        while (response := requests.get("https://www.bbc.com/", timeout=0.1)).status_code >= 500:
            print("sleeping")
            time.sleep(20)

        if response.status_code not in (200, 404):
            response.raise_for_status()

        print("Successful requests!")

        soup = BeautifulSoup(response.text, 'html.parser')

        for link in soup.find_all("a", {"class": "media__link"}):
            yield link.get('href')

    except Timeout as err:
        print(f"Retry due to timed out: {err}")

    except RequestException as err:
        raise RequestException("Unexpected request error")


# ----------------------------------------------------#

if __name__ == '__main__':
    for found_links in do_request():
        print(found_links)

现在对我来说的问题是我故意将超时设置为 0.1 以触发异常 Timeout 发生，我希望它在这里发生的是它应该在遇到请求后再次重试。

目前它停在那里，我想知道如果它遇到我不引发错误的超时，我应该怎么做才能再次重试请求？

【问题讨论】：

标签： python-3.x while-loop python-requests

【解决方案1】：

你可以在你的情况下递归调用函数本身，但要小心意外的边缘情况：

def do_request(retry: int = 3):
    try:
        # There is some scenarios where I would use my own proxies by doing
        # requests.get("https://www.bbc.com/", timeout=0.1, proxies={'https': 'xxx.xxxx.xxx.xx'))
        while (response := requests.get("https://www.bbc.com/", timeout=0.1)).status_code >= 500:
            print("sleeping")
            time.sleep(20)

        if response.status_code not in (200, 404):
            response.raise_for_status()

        print("Successful requests!")

        soup = BeautifulSoup(response.text, 'html.parser')

        for link in soup.find_all("a", {"class": "media__link"}):
            yield link.get('href')

    except Timeout as err:
        if retry:
            print(f"Retry due to timed out: {err}")
            yield from do_request(retry=retry - 1)
        else:
            raise

    except RequestException as err:
        raise RequestException("Unexpected request error")

这将尝试 3 次（或您在参数中设置的次数），直到 retry 等于 0 或直到遇到另一个错误

【讨论】：

您好！聪明的办法！我刚刚对其进行了测试，很抱歉地说，它看起来仍然停留在你的 if retry 处，它似乎在第一轮后不会再次重试
@ProtractorNewbie 我刚刚注意到您正在使用yield 来获取值，因此我已经对此进行了更正并编辑了我的答案，您可以使用新版本再试一次吗？
这样就完成了！ :) 现在只是出于好奇，如果我不只是在 do_request 的参数中重试，例如URL，我想在这种情况下我需要做类似yield from do_request(retry=retry - 1, url=URL) 的事情？
是的！它应该工作。或者如果你有更多参数，你可以使用*args, **kwargs（但如果你不熟悉，那是另一个问题的主题）

【解决方案2】：

我会把它放在一个while循环中，当动作完成时打破循环。

示例：

def do_request():
    while True:
        try:
            # There is some scenarios where I would use my own proxies by doing
            # requests.get("https://www.bbc.com/", timeout=0.1, proxies={'https': 'xxx.xxxx.xxx.xx'))
            while (response := requests.get("https://www.bbc.com/", timeout=0.1)).status_code >= 500:
                print("sleeping")
                time.sleep(20)

            if response.status_code not in (200, 404):
                response.raise_for_status()

            print("Successful requests!")

            soup = BeautifulSoup(response.text, 'html.parser')

            for link in soup.find_all("a", {"class": "media__link"}):
                yield link.get('href')
            break
        except Timeout as err:
            print(f"Retry due to timed out: {err}")

        except RequestException as err:
            raise RequestException("Unexpected request error")

您还可以在每次试用之间添加time.sleep(0.1)。

【讨论】：

【解决方案3】：

tenacity 包优雅地解决了各种重试问题。

对于您的问题，只需添加这样的装饰器：

@retry(retry=retry_if_exception_type(Timeout))
def do_request():
    while (response := requests.get("https://www.bbc.com/", timeout=0.1)).status_code >= 500:
        print("sleeping")
        time.sleep(20)

    if response.status_code not in (200, 404):
        response.raise_for_status()

    print("Successful requests!")

    soup = BeautifulSoup(response.text, 'html.parser')

    for link in soup.find_all("a", {"class": "media__link"}):
        yield link.get('href')

【讨论】：

您好！我刚刚测试了您的代码，但似乎什么也没发生……它只是在运行此脚本时退出代码。你到底有没有检查过？是否也可以打印出发生了什么样的异常？
刚刚用抛出 HTTPError 而不是 Timeout 再次对其进行了测试，看起来它毕竟不会重试......你确定它工作正常吗？ :)
我修正了一个缩进错误。现在可以用了吗？