【发布时间】:2021-06-18 13:59:51
【问题描述】:
我想去department 并且只想选择/打印name 和url。我尝试了以下方法,但我无法理解如何进入department 并选择这两个特定的东西。如何获取所有链接的“名称”和“网址”?
import json
import urllib.request
from bs4 import BeautifulSoup
def getContent():
# target site url
url = "www.xyz.com"
# requesting the url for data
request = urllib.request.Request(url)
# get the html, whole page
htmlpage = urllib.request.urlopen(request).read()
bsoup = BeautifulSoup(htmlpage, "html.parser")
# print(bsoup.prettify())
# main_table = bsoup.find("div",attrs)
# print(main_table)
# print(bsoup.find_all('name'))
# nav = bsoup.nav
# print(bsoup.title.department.url)
# for url in find_all('a'):
# print(url.get('href'))
for link in bsoup.find_all("a"):
print("Title: {}".format(link.get("name")))
print("href: {}".format(link.get("href")))
【问题讨论】:
标签: python web-scraping beautifulsoup urllib