【发布时间】:2020-11-10 14:46:04
【问题描述】:
这是我的代码:
import os
import requests
from urllib.parse import urljoin
from bs4 import BeautifulSoup
url = "https://mathsmadeeasy.co.uk/gcse-maths-revision/"
#If there is no such folder, the script will create one automatically
folder_location = r'E:\webscraping'
if not os.path.exists(folder_location):os.mkdir(folder_location)
response = requests.get(url)
soup= BeautifulSoup(response.text, "html.parser")
for link in soup.select("a[href$='.pdf']"):
#Name the pdf files using the last portion of each link which are unique in this case
filename = os.path.join(folder_location,link['href'].split('/')[-1])
with open(filename, 'wb') as f:
f.write(requests.get(urljoin(url,link['href'])).content)
关于为什么代码不下载我的任何文件格式数学修订站点的任何帮助。 谢谢。
【问题讨论】:
标签: python html web web-scraping beautifulsoup