【发布时间】:2020-05-06 08:22:53
【问题描述】:
我的目标是在<div class="items-box-body"> 中获取商品名称(棉衬衫)和价格(¥3,600)
<div class="items-box-body">
<h3 class="items-box-name font-2">cotton shirt</h3>
<div class="items-box-num">
<div class="items-box-price font-5">¥3,600</div>
</div>
</div>
我使用了下面的代码,但无法访问任何div。当我测试soup.find_all() 时,我看不到body 之间的任何内容。
from bs4 import BeautifulSoup
from selenium import webdriver
options = webdriver.ChromeOptions()
driver = webdriver.Chrome(executable_path=r'C:\Users\...', chrome_options=options)
soup = BeautifulSoup(driver.page_source, "html.parser")
site_url = 'https://www.mercari.com/jp/search/?sort_order=&keyword=&category_root=1&category_child=11&category_grand_child%5B122%5D=1&brand_name=&brand_id=&size_group=&price_min=&price_max=&item_condition_id%5B1%5D=1&shipping_payer_id%5B2%5D=1&status_on_sale=1'
response = driver.get(site_url)
time.sleep(5)
print(soup.html.unwrap())
>> <html></html>
test = soup.find_all()
print('1',test)
>> [<head></head>, <body></body>]
body = soup.body()
print('2',body)
>> 2 []
for item in soup.select('div[class*="default-container "]'):
print('3', item)
>>
for item in soup.select('div[class*="items-box-body"]'):
print('4', item)
>>
我做错了什么?
【问题讨论】:
-
可能是因为您试图在加载页面之前获取页面源?尝试在
response = driver.get(site_url)之后使用soup = BeautifulSoup(driver.page_source, "html.parser")。不是之前..
标签: python html selenium web-scraping beautifulsoup