【发布时间】:2021-10-31 06:33:47
【问题描述】:
我正在尝试获取在 Google 搜索结果页的标题上超链接的所有链接地址。还尝试将其附加到 CSV 文件中,我认为我现在已经很清楚了。
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
import requests
import re
import csv
#f = open("web_search_terms.txt", "r")
terms = ["thanks","for","the help"]
terms = []
for line in f:
stripped_line = line.strip()
terms.append(stripped_line)
with open("web_urls.csv", "w") as f_out:
writer = csv.writer(f_out)
writer.writerow(["Search Term", "URL"])
for t in terms:
url = f"https://google.com/search?q={t}"
print(f"Getting {url}")
html_page = requests.get(url)
soup = BeautifulSoup(html_page.content, "html")
divs = soup.findAll("div", attrs={"class": "yuRUbf"})
for item in divs:
writer.writerow([t, item.get_text(strip=True)])
虽然我无法将链接附加到“div”列表,但不确定如何在标有“yuRUbf”的类中获取 href
任何帮助将不胜感激!
非常感谢!
【问题讨论】:
标签: python csv web-scraping