【问题标题】:How to handle Attribute Error: 'NoneType' object has no attribute 'findAll'如何处理属性错误:“NoneType”对象没有属性“findAll”
【发布时间】:2020-12-11 21:51:02
【问题描述】:

在使用以下功能扫描大量网站时,我收到了一个错误(见下文)。我可以在下面的函数中添加任何except 步骤来处理此类错误吗?

async def scrape(url):
    try:
        r = requests.get(url, timeout=(3, 6))
        r.raise_for_status()
        soup = BeautifulSoup(r.content, 'html.parser')
        data = {
"coming soon": soup.body.findAll(text = re.compile("coming soon", re.I)),
"Opening Soon": soup.body.findAll(text = re.compile("Opening Soon", re.I)),
"Under Construction": soup.body.findAll(text = re.compile("Under Construction", re.I)),
"Currently Unavailable": soup.body.findAll(text = re.compile("Currently Unavailable", re.I)),
"button": soup.findAll(text = re.compile('button2.js'))}
        results[url] = data
    except (requests.exceptions.ConnectionError, requests.exceptions.Timeout, requests.exceptions.MissingSchema):
        status[url] = "Connection Error"
    except (requests.exceptions.HTTPError):
        status[url] = "Http Error"
    except (requests.exceptions.TooManyRedirects):
        status[url] = "Redirects"
    except (requests.exceptions.RequestException) as err:
        status[url] = "Fatal Error: " + err + url
    else:
        status[url] = "OK"

错误:

Task exception was never retrieved
future: <Task finished name='Task-4782' coro=<scrape() done, defined at crawler.py:47>  exception=AttributeError("'NoneType' object has no attribute 'findAll'")>
Traceback (most recent call last):
  File "crawler.py", line 53, in scrape
    "coming soon": soup.body.findAll(text = re.compile("coming soon", re.I)),
AttributeError: 'NoneType' object has no attribute 'findAll'

【问题讨论】:

    标签: python web-scraping beautifulsoup web-crawler


    【解决方案1】:

    这是因为soup.bodyNone,我们可以简单地用if条件处理这种情况。

    async def scrape(url):
            try:
                r = requests.get(url, timeout=(3, 6))
                r.raise_for_status()
                soup = BeautifulSoup(r.content, 'html.parser')
                if soup.body:
                   data = {
                   "coming soon": soup.body.findAll(text = re.compile("coming soon", re.I)),
                   "Opening Soon": soup.body.findAll(text = re.compile("Opening Soon", re.I)),
                   "Under Construction": soup.body.findAll(text = re.compile("Under Construction", re.I)),
                   "Currently Unavailable": soup.body.findAll(text = re.compile("Currently Unavailable", re.I)),
                   "button": soup.findAll(text = re.compile('button2.js'))}
                   results[url] = data
            except (requests.exceptions.ConnectionError, requests.exceptions.Timeout, requests.exceptions.MissingSchema):
                status[url] = "Connection Error"
            except (requests.exceptions.HTTPError):
                status[url] = "Http Error"
            except (requests.exceptions.TooManyRedirects):
                status[url] = "Redirects"
            except (requests.exceptions.RequestException) as err:
                status[url] = "Fatal Error: " + err + url
            else:
                status[url] = "OK"
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2013-08-06
      • 2017-09-14
      • 2021-02-17
      • 2023-02-17
      • 2018-05-02
      • 2019-03-17
      • 1970-01-01
      相关资源
      最近更新 更多