无法使用 BeautifulSoup4 抓取网站答案

【问题标题】：Cannot scrape a website with BeautifulSoup4无法使用 BeautifulSoup4 抓取网站
【发布时间】：2018-04-17 17:18:46
【问题描述】：

我要抓取的文本是标题 123rd Meeting 来自

https://www.bcb.gov.br/en/#!/c/copomstatements/1724

为此，我使用此代码

import urllib.request           #get the HTML page from url 
import urllib.error

from bs4 import BeautifulSoup


# set page to read
with urllib.request.urlopen('https://www.bcb.gov.br/en/#!/c/copomstatements/1724') as response:
   page = response.read()

# parse the html using beautiful soup and store in variable `soup`
soup = BeautifulSoup(page, "html.parser")
print(soup)

# Inspect: <h3 class="BCTituloPagina ng-binding">123rd Meeting</h3>
title = soup.find("h3", attrs={"class": "BCTituloPagina ng-binding"})
print(title)

但是，命令

print(soup)

既不返回标题：第 123 次会议，也不返回正文：鉴于 .... 目标降低 25 个基点。

【问题讨论】：

标签： python-3.x beautifulsoup

【解决方案1】：

您不能使用 python 中的普通请求库来提取标题，因为您尝试提取的元素是使用 javascript 呈现的。您将需要使用 selenium 来实现您的目标。

代码：

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get('https://www.bcb.gov.br/en/#!/c/copomstatements/1724')
WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, '//h3')))
title = driver.find_element_by_xpath('//h3').text
print(title)
driver.close()

输出：

123rd Meeting

【讨论】：

感谢@Ali 的及时回复。由于 driver = webdriver.Chrome() 打开 Google Chrome，并且这个命令必须运行（循环）至少 100 次，我添加了以下行来关闭它 driver.close()