【问题标题】:Parsing a dynamic webpage using selenium使用 selenium 解析动态网页
【发布时间】:2020-10-03 12:09:24
【问题描述】:

我正在尝试从亚马逊上抓取一张图片,这并不容易。

我想我快到了,但我没有得到结果。

在这里,我使用 selenium 1. 打开主图像 2. 单击缩略图中的第二个图像 3.然后获取第二张图片全尺寸的src。

但它失败了,我不知道为什么

这是我写的。

from urllib.request import urlretrieve
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
import time

driver = webdriver.Firefox()
url = "https://www.amazon.com/Kraft-Original-Macaroni-Microwaveable-Packets/dp/B005ECO3H0"
driver.get(url)
action = ActionChains(driver)
time.sleep(5)

driver.find_element_by_css_selector('#landingImage').click()
time.sleep(10)

html = driver.page_source
soup = BeautifulSoup(html,"html.parser")

driver.find_element_by_css_selector('#ivImage_1').click()
amazon = soup.select_one(".fullscreen")
imgUrl = amazon.find("img")['src']
print(imgUrl)

我无法理解的一件事是,如果我输入 print(amazon),它会给我 img 标签,但根据上面代码的结果,imgUrl 是“Nonetype”。

请帮我找出答案。

【问题讨论】:

    标签: python selenium web-scraping beautifulsoup web-crawler


    【解决方案1】:

    给你

    from urllib.request import urlretrieve
    from bs4 import BeautifulSoup
    from selenium import webdriver
    from selenium.webdriver.common.action_chains import ActionChains
    import time
    
    driver = webdriver.Firefox()
    url = "https://www.amazon.com/Kraft-Original-Macaroni-Microwaveable- 
    Packets/dp/B005ECO3H0"
    driver.get(url)
    action = ActionChains(driver)
    time.sleep(5)
    
    driver.find_element_by_css_selector('#landingImage').click()
    time.sleep(5)
    
    html = driver.page_source
    soup = BeautifulSoup(html,"html.parser")
    
    driver.find_element_by_css_selector('#ivImage_1').click()
    image_url = driver.find_element_by_class_name("fullscreen").get_attribute("src")
    print(image_url)
    
    #if you want to download
    import requests
    resp = requests.get(image_url)
    with open("asd.png", "wb")as image:
        image.write(resp.content)
    

    【讨论】:

      猜你喜欢
      • 2021-11-18
      • 1970-01-01
      • 2015-07-21
      • 1970-01-01
      • 2023-03-18
      • 2014-10-10
      • 2021-11-15
      • 2012-01-28
      • 2023-04-01
      相关资源
      最近更新 更多