【问题标题】:requests.exceptions.MissingSchema: Invalid URL: No schema suppliedrequests.exceptions.MissingSchema:无效的 URL:未提供架构
【发布时间】:2022-01-23 23:45:52
【问题描述】:
    #Downloading All XKCD Comics
url = "http://xkcd.com"
os.makedirs("xkcd", exist_ok=True)
while not url.endswith("#"):
    print("Downloading page %s..." % url)
    res = requests.get(url)
    res.raise_for_status()

    soup = bs4.BeautifulSoup(res.text)
    comicElem = soup.select("#comic img")
    if comicElem == []:
        print("Could not find comic image.")
    else:
        comicUrl = comicElem[0].get("src")
        #Download the image.
        print('Downloading image %s...' % (comicUrl))
        res = requests.get(comicUrl)
        res.raise_for_status()
        imageFile = open(os.path.join("xkcd", os.path.basename(comicUrl)),"wb")
        for chunk in res.iter_content(None):
            imageFile.write(chunk)
        imageFile.close()
    prevLink = soup.select("a[rel=prev]")[0]
    url = "http://xkcd.com" + prevLink.get("href")
print("Done.")

完整的代码如上所述。完整输出如下所示。

    Downloading page http://xkcd.com...
C:/Users/emosc/PycharmProjects/RequestsLearning/main.py:38: GuessedAtParserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 38 of the file C:/Users/emosc/PycharmProjects/RequestsLearning/main.py. To get rid of this warning, pass the additional argument 'features="html.parser"' to the BeautifulSoup constructor.

  soup = bs4.BeautifulSoup(res.text)
Traceback (most recent call last):
  File "C:/Users/emosc/PycharmProjects/RequestsLearning/main.py", line 46, in <module>
    res = requests.get(comicUrl)
  File "C:\Users\emosc\PycharmProjects\RequestsLearning\venv\lib\site-packages\requests\api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File "C:\Users\emosc\PycharmProjects\RequestsLearning\venv\lib\site-packages\requests\api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "C:\Users\emosc\PycharmProjects\RequestsLearning\venv\lib\site-packages\requests\sessions.py", line 528, in request
    prep = self.prepare_request(req)
  File "C:\Users\emosc\PycharmProjects\RequestsLearning\venv\lib\site-packages\requests\sessions.py", line 456, in prepare_request
    p.prepare(
  File "C:\Users\emosc\PycharmProjects\RequestsLearning\venv\lib\site-packages\requests\models.py", line 316, in prepare
    self.prepare_url(url, params)
  File "C:\Users\emosc\PycharmProjects\RequestsLearning\venv\lib\site-packages\requests\models.py", line 390, in prepare_url
    raise MissingSchema(error)
requests.exceptions.MissingSchema: Invalid URL '//imgs.xkcd.com/comics/rapid_test_results.png': No schema supplied. Perhaps you meant http:////imgs.xkcd.com/comics/rapid_test_results.png?
Downloading image //imgs.xkcd.com/comics/rapid_test_results.png...

我从未见过像 http:////imgs.xkcd.com/comics/rapid_test_results.png 这样的图片链接(只有 2 个反斜杠而不是 4 个),BS4 建议我使用它,但我不知道如何使用解决这个错误。通常遵循 Automate the Boring Stuff with Python 书,与那本书中的代码相同,但是当我尝试抓取该站点时会出现此错误。感谢您的帮助。

【问题讨论】:

  • 我复制并粘贴了与本书相同的代码,可能网站无法正常工作...

标签: python beautifulsoup


【解决方案1】:

http:// 和 https:// 协议都是模式的示例。在您的代码中每次使用之前打印您的 URL,并检查这两个是否没有 1:1 包含在您的 url 开头。添加http://url或https://url失败会出现如图所示的错误,请务必添加http://。

【讨论】:

    猜你喜欢
    • 2018-06-02
    • 1970-01-01
    • 2019-10-30
    • 2019-06-16
    • 1970-01-01
    • 2021-05-05
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多