【发布时间】:2020-12-19 07:52:33
【问题描述】:
我是网络抓取的新手,但对请求有足够的命令,BeautifulSoup 和 Selenium 可以从网站提取数据。现在的问题是,当点击下一页的页码时,我正在尝试从网站上抓取 URL 不会更改的数据。
网站网址 ==> https://www.ellsworth.com/products/adhesives/
我也尝试了 Google 开发者工具,但没有成功。如果有人用代码指导我,将不胜感激。 Google Developer show Get Request
这是我的代码
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup
import pandas as pd
import requests
itemproducts = pd.DataFrame()
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get('https://www.ellsworth.com/products/adhesives/')
base_url = 'https://www.ellsworth.com'
html= driver.page_source
s = BeautifulSoup(html,'html.parser')
data = []
href_link = s.find_all('div',{'class':'results-products-item-image'})
for links in href_link:
href_link_a = links.find('a')['href']
data.append(base_url+href_link_a)
# url = 'https://www.ellsworth.com/products/adhesives/silicone/dow-838-silicone-adhesive-sealant-white-90-ml-tube/'
for c in data:
driver.get(c)
html_pro = driver.page_source
soup = BeautifulSoup(html_pro,'html.parser')
title = soup.find('span',{'itemprop':'name'}).text.strip()
part_num = soup.find('span',{'itemprop':'sku'}).text.strip()
manfacture = soup.find('span',{'class':'manuSku'}).text.strip()
manfacture_ = manfacture.replace('Manufacturer SKU:', '').strip()
pro_det = soup.find('div',{'class':'product-details'})
p = pro_det.find_all('p')
try:
d = p[1].text.strip()
c = p.text.strip()
except:
pass
table = pro_det.find('table',{'class':'table'})
tr = table.find_all('td')
typical = tr[1].text.strip()
brand = tr[3].text.strip()
color = tr[5].text.strip()
image = soup.find('img',{'itemprop':'image'})['src']
image_ = base_url + image
png_url = title +('.jpg')
img_data = requests.get(image_).content
with open(png_url,'wb') as fh:
fh.write(img_data)
itemproducts=itemproducts.append({'Product Title':title,
'Part Number':part_num,
'SKU':manfacture_,
'Description d':d,
'Description c':c,
'Typical':typical,
'Brand':brand,
'Color':color,
'Image URL':image_},ignore_index=True)
【问题讨论】:
-
我知道您是 Stack Overflow 的新手。我建议您阅读以下内容:What should I do when someone answers my question?
标签: python selenium web-scraping beautifulsoup python-requests