当我使用正确的 CSS 选择器时，我的 webscraper 返回一个空列表答案

【问题标题】：My webscraper is returning an empty list when im using the correct CSS selector当我使用正确的 CSS 选择器时，我的 webscraper 返回一个空列表
【发布时间】：2019-05-29 19:36:54
【问题描述】：

我正在尝试使用 selenium 或 scrapy 从这个特定的 url 中刮取一些数据。

我已经毫无问题地抓取了其他页面，但是当涉及到这些特定的 url 时，我试图抓取到列表中的信息返回为空。我使用了scrapy，然后继续使用硒，但结果是一样的。我也在使用 pycharm 和 chromedriver。

我特别要查找的信息是“https://shop.freedommobile.ca/devices”上的所有不同手机型号。我打印列表只是为了发现没有从网站上抓取任何内容，或者抓取成功但没有返回任何内容。

当我尝试从这里抓取任何东西时也会发生同样的情况：

https://shop.freedommobile.ca/devices/Apple/iPhone_XS_Max?sku=190198786074&planSku=Freedom%20Big%20Gig%20%2B%20Talk%2015GB

from selenium import webdriver

#open chrome browser and navigate to the webpage
driver = webdriver.Chrome()
driver.get("https://shop.freedommobile.ca/devices")

#extract the names of the phones
phones = driver.find_elements_by_css_selector('.jXeFbj')

#counts phone and its model
for element in range(len(phones)):
    numPhone = int(element) + 1
    print("phone "+ str(numPhone) +" : " + phones[element].text)


#number of phones in total
sizeOfList = len(phones)
print(sizeOfList)

应该发生的事情是将手机的所有型号名称拉到一个列表中。

手机 = ['iPhone XS Max', 'iPhone XS', 'iPhone XR',...]

【问题讨论】：

尝试添加 EC 以便脚本等待项目加载。 WebDriverWait(driver,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,".jXeFbj"))).

标签： python-3.x selenium web-scraping scrapy css-selectors

【解决方案1】：

您的代码没问题，可能发生的情况是您有时通过快速发送请求得到一个空列表。

您可以使用 WebDriverWait 解决此问题。

您可以使用以下代码进行小的改进：

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.get("https://shop.freedommobile.ca/devices")

# get the list of phones
wait = WebDriverWait(driver, 10)
phones = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '.jXeFbj')))
numPhones = len(phones)

#prints the formatted output of each phone
for idx, phone in enumerate(phones):
    phone_name = phone.text
    print("phone " + str(idx) + " : " + phone_name)

print(numPhones)

输出 1：

phone 0 : iPhone XS Max
phone 1 : iPhone XS
phone 2 : iPhone XR
phone 3 : iPhone 8 Plus
phone 4 : iPhone 8
phone 5 : Galaxy S10+
...

输出 2：

【讨论】：

【解决方案2】：

使用 ['iPhone XS Max', 'iPhone XS', 'iPhone XR',...] 的形式将手机的所有型号名称刮到一个列表中Selenium 您必须为visibility_of_all_elements_located() 诱导WebDriverWait，并且您可以使用以下任一Locator Strategies：

代码块：

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

options = webdriver.ChromeOptions() 
options.add_argument("start-maximized")
# options.add_argument('disable-infobars')
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get("https://shop.freedommobile.ca/devices")
#using CSS_SELECTOR
print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "h3[class^='deviceListItem__DeviceModel-']")))])
#using XPATH
#print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//h3[starts-with(@class, 'deviceListItem__DeviceModel-')]")))])

控制台输出：

['iPhone XS Max', 'iPhone XS', 'iPhone XR', 'iPhone 8 Plus', 'iPhone 8', 'Galaxy S10+', 'Galaxy S10', 'Galaxy S10e', 'Galaxy Tab A 8 LTE', 'Galaxy Note9', 'Galaxy S9', 'Galaxy A8', 'G7 Power', 'Moto E5 Play', 'Pixel 3a', 'Pixel 3', 'Pixel 3 XL', 'Z557', 'G7 ThinQ', 'P30 lite', 'Mate 20 Pro', 'X Power 3', 'G8 ThinQ', 'Q Stylo +', 'GoFLIP', 'Bring Your', 'Own Device']

【讨论】：

@Lawrence_matei 我可以得到关于这个答案的反馈吗？