【发布时间】:2020-12-31 10:30:24
【问题描述】:
我正在从网站下载书籍,几乎我的代码运行顺利,但是当我尝试在我的电脑上打开 pdf 书籍时。 Adobe Acrobat Reader 生成的错误是不支持的文件类型。
这是书籍格式的图像,我确定我的代码需要更正,因为网站上的书籍格式与通常的 PDF 文件不同。
代码:
import requests
from bs4 import BeautifulSoup
url = 'https://global.oup.com/education/support-learning-anywhere/key-resources-online/?region=international&utm_campaign=learninganywhere&utm_source=umbraco&utm_medium=display&utm_content=support_learning_key_resources&utm_team=int#Primary'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
table_data = soup.find_all('td')
books_url_list = []
for link in table_data:
books_url = link.find('a')['href']
books_url_list.append(books_url+'.pdf')
book = books_url_list[1]
book_response = requests.get(book)
with open('books.pdf', 'wb') as f:
f.write(book_response.content)
`
【问题讨论】:
标签: python file pdf download python-requests