【问题标题】:InvalidSchema("No connection adapters were found for {!r}".format(url))InvalidSchema("没有为 {!r} 找到连接适配器".format(url))
【发布时间】:2021-06-11 20:33:02
【问题描述】:

这个错误是什么意思以及如何解决这种错误?我收到这个错误

Traceback (most recent call last):
  File "load-more.py", line 146, in <module>
    response = session.get(link)
File "C:\Users\Xone\.virtualenvs\Web_Scrapers-A6P4QRzc\lib\site-packages\requests \sessions.py", line 555, in get
    return self.request('GET', url, **kwargs)
File "C:\Users\Xone\.virtualenvs\Web_Scrapers-A6P4QRzc\lib\site-packages\requests\sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
File "C:\Users\Xone\.virtualenvs\Web_Scrapers-A6P4QRzc\lib\site-packages\requests\sessions.py", line 649, in send
    adapter = self.get_adapter(url=request.url)
File "C:\Users\Xone\.virtualenvs\Web_Scrapers-A6P4QRzc\lib\site-packages\requests\sessions.py", line 742, in get_adapter
   raise InvalidSchema("No connection adapters were found for {!r}".format(url))
requests.exceptions.InvalidSchema: No connection adapters were found for '\\"https:\\/\\/lifebridgecapital.com\\/2021\\/06\\/11\\/ws964-multifamily-investing-is-a-team-sport-with-cameron-roy\\/\\"'

当我尝试解析标题的链接时。我正在尝试使用请求发布方法进行抓取,代码如下:

import requests
from bs4 import BeautifulSoup

 headers = {
     'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:89.0) Gecko/20100101 Firefox/89.0',
     'Accept': 'application/json, text/javascript, */*; q=0.01',
     'Accept-Language': 'en-US,en;q=0.5',
     'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
     'X-Requested-With': 'XMLHttpRequest',
     'Origin': 'https://lifebridgecapital.com',
     'Connection': 'keep-alive',
     'Referer': 'https://lifebridgecapital.com/podcast/',
     'Sec-GPC': '1',
     'TE': 'Trailers',
  }

 data = {'action': 'gdlr_core_post_ajax', 
'settings[category][]': 'podcast', 
'settings[tag]': '', 'settings[num-fetch]': '9',
'settings[paged]': '1', 
'option[name]': 'paged', 

 }

 session = requests.Session()

for page in range(0, 55):
    data['option[value]'] = str(page + 1)
    response = session.post('https://lifebridgecapital.com/wp-admin/admin-ajax.php', headers=headers, data=data)
    links = [a['href'] for a in BeautifulSoup(response.text, 'lxml').select('h3 > a')]
    for link in links:
        response = session.get(link)
        page = BeautifulSoup(response.text, 'lxml')
        title = page.find('h3').text
        print(f'Title: {title}, Link: {link}')


        #print(f'title: {title}, links: {links}')

我正在获取所有链接,但是当尝试解析该链接以获取标题时,发生此 Invalid Schema 错误,我在 google 上搜索了很多,然后在 SO 上询问,但没有得到解决方案或回答为什么会发生此错误。

【问题讨论】:

    标签: python-3.x ajax beautifulsoup python-requests http-post


    【解决方案1】:

    您会收到 JSON 响应,并且 html 位于 content 中。所以你不能直接使用 Beautiful Soup 和response.text

    response.text 替换为response.json()['content']

    links = [a['href'] for a in BeautifulSoup(response.json()['content'], 'lxml').select('h3 > a')]
    

    【讨论】:

    • 错误消失,但链接重复。知道吗,为什么?
    • 您可以使用集合而不是列表来消除重复项:links = {a['href'] for a in BeautifulSoup(response.json()['content'], 'lxml').select('h3 &gt; a')}
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2017-03-19
    • 1970-01-01
    • 1970-01-01
    • 2021-09-08
    • 1970-01-01
    • 2021-11-29
    相关资源
    最近更新 更多