【发布时间】:2019-05-16 14:07:57
【问题描述】:
我必须抓取一个电子商务网站,该网站在首页加载 45 种产品,然后在滚动到页面末尾时加载额外的 45 种产品。
我正在使用 Python 一个 Selenium Web 驱动程序来抓取此页面。
Ajax 似乎会在每次后续重新加载时替换容器,因此在加载所有产品后无法提取所有数据。
附上您的参考代码。请指导我如何刮掉所有产品
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
import pandas
from numpy import long
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
driver = webdriver.Chrome()
html=driver.get("https://www.ajio.com/women-jackets-coats/c/830316012")
assert 'Ajio' in driver.title
content = driver.find_elements_by_class_name('item')
totalitems=long(driver.find_element_by_class_name('length').text.strip(' Items Found').replace(',','',10))
loop_count=int(((totalitems-len(content))/len(content)))
print(loop_count)
data=[]
row=['Brand','Description','Offer_Price','Original_Price','Discount']
data.append(row)
for i in range(1,loop_count):
content = driver.find_elements_by_class_name('item')
print(i)
print(len(content))
for item in content:
row=[]
row.append(item.find_element_by_class_name('brand').text.strip())
row.append(item.find_element_by_class_name('name').text.strip())
row.append(item.find_element_by_class_name('price').text.strip().strip('Rs. '))
try:
row.append(item.find_element_by_class_name('orginal-price').text.strip('Rs. '))
except NoSuchElementException as exception:
row.append(item.find_element_by_class_name('price').text.strip('Rs. '))
try:
row.append(item.find_element_by_class_name('discount').text.strip())
except NoSuchElementException as exception:
row.append("No Discount")
data.append(row)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight-850);")
try:
myElem = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.CLASS_NAME, 'loader')))
except TimeoutException:
print("Loading took too much time!")
df = pandas.DataFrame(data)
df.to_csv(r"C:\Ajio.csv", sep=',',index=False, header=False, mode='w') #mode='a' for append
【问题讨论】:
标签: python selenium-webdriver web-scraping