【问题标题】:Unable to find element BeautifulSoup找不到元素 BeautifulSoup
【发布时间】:2021-10-24 13:30:08
【问题描述】:

我正在尝试解析来自以下网站的特定 href 链接:https://www.murray-intl.co.uk/en/literature-library

我试图解析的元素:

<a class="btn btn--naked btn--icon-left btn--block focus-within" href="https://www.aberdeenstandard.com/docs?editionId=9123afa2-5318-4715-9783-e07d08e2e7cc&amp;_ga=2.12911351.1364356977.1629796255-1577053129.1629192717" target="blank">Portfolio Holding Summary<i class="material-icons btn__icon">library_books</i></a>

但是,使用 BeautifulSoup 我无法获得所需的元素,可能是由于接受了 cookie。

from bs4 import BeautifulSoup
import urllib.request
import requests as rq

page = requests.get('https://www.murray-intl.co.uk/en/literature-library')
soup = BeautifulSoup(page.content, 'html.parser')
link = soup.find_all('a', class_='btn btn--naked btn--icon-left btn--block focus-within')
url = link[0].get('href')
url

我还是 BS4 的新手,希望有人能在正确的课程上帮助我。

提前谢谢你!

【问题讨论】:

    标签: python parsing web-scraping beautifulsoup


    【解决方案1】:

    要获得正确的标签,请移除 "focus-within" 类(稍后由 JavaScript 添加):

    import requests
    from bs4 import BeautifulSoup
    
    url = "https://www.murray-intl.co.uk/en/literature-library"
    soup = BeautifulSoup(requests.get(url).content, "html.parser")
    
    links = soup.find_all("a", class_="btn btn--naked btn--icon-left btn--block")
    for u in links:
        print(u.get_text(strip=True), u.get("href", ""))
    

    打印:

    ...
    
    Portfolio Holding Summarylibrary_books https://www.aberdeenstandard.com/docs?editionId=9123afa2-5318-4715-9783-e07d08e2e7cc
    
    ...
    

    编辑:要仅获取指定的链接,您可以使用例如 CSS 选择器:

    link = soup.select_one('a:-soup-contains("Portfolio Holding Summary")')
    print(link["href"])
    

    打印:

    https://www.aberdeenstandard.com/docs?editionId=9123afa2-5318-4715-9783-e07d08e2e7cc
    

    【讨论】:

    • 非常感谢!你有什么想法,我怎么能指定我只想要那个特定的链接?
    • 你太棒了。谢谢!
    猜你喜欢
    • 1970-01-01
    • 2020-10-16
    • 2018-03-03
    • 2018-12-14
    • 2015-01-09
    • 2013-11-17
    • 1970-01-01
    • 1970-01-01
    • 2021-09-24
    相关资源
    最近更新 更多