【发布时间】:2020-06-30 15:08:41
【问题描述】:
我有一个问题,当我使用 BeautifulSoup 请求时:
page = urlopen(url).read().decode('utf8')
soup = BeautifulSoup(page)
text = ' '.join(map(lambda p: p.text, soup.find_all('p')))
return soup.title.text, text
我得到了这样一个漂亮的输出:
Coronavirus: Johnson sets out 'ambitious' economic recovery plan - BBC News
* Share this with Email Facebook Messenger Messenger Twitter Pinterest WhatsApp LinkedIn Copy this link These are external links and will open in a new window Boris Johnson has said now is the time to be "ambitious" about the UK's future, as he set out a post-coronavirus recovery plan.
* Infrastructure projects in England would be "accelerated" and there would be investment in new academy schools, green buses and new broadband, the PM added.
但是当我使用 BeautifulSoup 请求时:
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
feed = BeautifulSoup(soup.decode('utf8'))
text = ' '.join(map(lambda p: p.text, feed.find_all('p')))
return soup.title.text, text
我得到一个像这样丑陋的输出:
Coronavirus: Johnson sets out 'ambitious' economic recovery plan - BBC News
*
Share this with
Email
Facebook
Messenger
Messenger
Twitter
Pinterest
WhatsApp
LinkedIn
Copy this link
These are external links and will open in a new window
Boris Johnson has said now is the time to be "ambitious" about the UK's future, as he set out a post-coronavirus recovery plan.
* Infrastructure projects in England would be "accelerated" and there would be investment in new academy schools, green buses and new broadband, the PM added.
我担心我无法使用 BeautifulSoup 请求,因为我收到 HTTP 403 Forbidden 错误,我需要使用 BeautifulSoup 请求。如何通过使用 BeautifulSoup 请求获得与使用 BeautifulSoup 请求时相同的漂亮输出?
【问题讨论】: