【问题标题】:Pulling all text (multple p tags) with BeautifulSoup and Selenium returns []使用 BeautifulSoup 和 Selenium 提取所有文本(多个 p 标签)返回 []
【发布时间】:2020-06-18 19:10:28
【问题描述】:

我试图在评论卡中提取 p 标签 cmets,最终使用 BeautifulSoup 和 Selenium 通过 link 在 vi​​vino.com 上进行搜索。我能够打开第一个链接,但在评论框中拉出 p 文本会返回 []。

url = "https://www.vivino.com/explore?e=eJwNyTEOgCAQBdHbbA2F5e-8gbE2uKyERBYCaOT20swrJlVYSlFhjaHkPixTHtg34pmVyvzhwutqlO5uyid8bJwf7UeRyqKdMrw0pgYdPwIzGwQ="
driver = webdriver.Chrome('/Users/myname/Downloads/chromedriver')
driver.implicitly_wait(30)
driver.get(url)

python_button = driver.find_element_by_class_name('anchor__anchor--2QZvA')
python_button.click() 
soup = BeautifulSoup(driver.page_source, 'lxml')
print(soup.find_all('p'))

table = soup.findAll('div',attrs={"class":"reviewCard__reviewContainer--1kMJM"})
print(table)
driver.quit()

有人可以就拉出 cmets 的正确方法提出建议吗?由于每页有超过 1 条评论,我需要循环播放吗? 我也用'html.parser'而不是'lxml'试过这个。哪个是正确的使用?

非常感谢您的帮助。

【问题讨论】:

  • 你用过selenium,不用beautifulsoup,会更慢。

标签: python selenium beautifulsoup


【解决方案1】:

这是你需要做的:

import atexit
from pprint import pprint

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.expected_conditions import visibility_of_all_elements_located
from selenium.webdriver.support.wait import WebDriverWait


def start_driver():
    driver = webdriver.Chrome()
    atexit.register(driver.quit)
    driver.maximize_window()
    return driver


def find_elements(driver, locator):
    return WebDriverWait(driver, 10, 2).until(visibility_of_all_elements_located(locator))


URL = "https://www.vivino.com/explore?e=eJwNyTEOgCAQBdHbbA2F5e-8gbE2uKyERBYCaOT20swrJlVYSlFhjaHkPixTHtg34pmVyvzhwutqlO5uyid8bJwf7UeRyqKdMrw0pgYdPwIzGwQ="
RESULTS = By.CSS_SELECTOR, "div[class*='vintageTitle'] > a"


def main():
    driver = start_driver()
    driver.get(URL)

    # note the results
    wines = []
    for element in find_elements(driver, RESULTS):
        link = element.get_attribute("href")
        name = element.find_element_by_css_selector("span[class*='vintageTitle__wine']").text
        wines.append((name, link))

    pprint(wines)

    # go extract details from each result's page
    for name, link in wines:
        print("getting comments for wine: ", name)
        driver.get(link)
        # you can do the rest ;)


if __name__ == '__main__':
    main()

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2020-01-15
    • 1970-01-01
    • 2014-11-17
    • 2014-06-16
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多