【问题标题】:BeautifulSoup request or requests?BeautifulSoup 请求或请求?
【发布时间】:2020-06-30 15:08:41
【问题描述】:

我有一个问题,当我使用 BeautifulSoup 请求时:

 page = urlopen(url).read().decode('utf8')
 soup = BeautifulSoup(page)
 text = ' '.join(map(lambda p: p.text, soup.find_all('p')))
 return soup.title.text, text

我得到了这样一个漂亮的输出:

Coronavirus: Johnson sets out 'ambitious' economic recovery plan - BBC News
*  Share this with Email Facebook Messenger Messenger Twitter Pinterest WhatsApp LinkedIn Copy this link These are external links and will open in a new window Boris Johnson has said now is the time to be "ambitious" about the UK's future, as he set out a post-coronavirus recovery plan.
* Infrastructure projects in England would be "accelerated" and there would be investment in new academy schools, green buses and new broadband, the PM added.

但是当我使用 BeautifulSoup 请求时:

 page = requests.get(url)
 soup = BeautifulSoup(page.content, 'html.parser')
 feed = BeautifulSoup(soup.decode('utf8'))
 text = ' '.join(map(lambda p: p.text, feed.find_all('p')))
 return soup.title.text, text

我得到一个像这样丑陋的输出:

Coronavirus: Johnson sets out 'ambitious' economic recovery plan - BBC News
* 

 
                    Share this with
                    
                       Email
                       
                       Facebook
                       
                       Messenger
                       
                       Messenger
                       
                       Twitter
                       
                       Pinterest
                       
                       WhatsApp
                       
                       LinkedIn
                       
                    Copy this link
                    
                    These are external links and will open in a new window
                    
             Boris Johnson has said now is the time to be "ambitious" about the UK's future, as he set out a post-coronavirus recovery plan.
* Infrastructure projects in England would be "accelerated" and there would be investment in new academy schools, green buses and new broadband, the PM added.

我担心我无法使用 BeautifulSoup 请求,因为我收到 HTTP 403 Forbidden 错误,我需要使用 BeautifulSoup 请求。如何通过使用 BeautifulSoup 请求获得与使用 BeautifulSoup 请求时相同的漂亮输出?

【问题讨论】:

    标签: python-3.x beautifulsoup


    【解决方案1】:

    我建议你坚持使用BeautifulSoup Request,但这样做是为了修复 HTTP 403 禁止错误:

    Request(url, headers={'User-Agent': 'Mozilla/5.0'})
    

    希望这会有所帮助!

    【讨论】:

    • 我曾尝试过为 BeautifulSoup Request 设置不同值的标头,但我一直收到错误消息,因此我不得不切换到 BeautifulSoup Requests。
    【解决方案2】:

    我通过删除以下代码行解决了上述问题:

    feed = BeautifulSoup(soup.decode('utf8'))
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2021-09-15
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-11-03
      • 1970-01-01
      • 2016-02-21
      相关资源
      最近更新 更多