【发布时间】:2019-07-14 10:43:41
【问题描述】:
我正在尝试使用 Beautifulsoup 和精确匹配的 CSS 选择器从 div 中提取。
我已经阅读了a link 的帖子 并发布 a link ,但他们没有解决我的问题。
我要提取的div只有以下几种:
<div class="s-suggestion" data-alias="aps" data-crid="3LY5DQXGQLBAV" data-isfb="false" data-issc="false" data-keyword="aa batteries" data-nid="" data-reftag="nb_sb_ss_i_3_1" data-store="" data-type="a9" id="issDiv2"><span class="s-heavy"></span>a<span class="s-heavy">a batteries</span></div>
它们必须包含:data-alias="aps" 而不仅仅是 data-alias=(因为还有许多其他具有其他属性的 div,例如 data-alias="gift-cards" 等。
这是我尝试过的代码。
from selenium import webdriver
from bs4 import BeautifulSoup
import time
browser = webdriver.Chrome('chromedriver.exe')
mainUrl = "https://www.amazon.com/"
browser.get(mainUrl)
mainSoup = BeautifulSoup(browser.page_source, "html.parser")
searchInput = browser.find_element_by_xpath('//input[@id="twotabsearchtextbox"]')
searchInput.clear()
searchInput.send_keys('a')
time.sleep(2)
searchSoup = BeautifulSoup(browser.page_source, "html.parser")
searchResult = searchSoup.find_all('div', attrs={'id': 'suggestions-template'})
keys = searchSoup.select('div[data-alias]')
for key in keys:
print(key)
这是我得到的结果:
<div class="s-suggestion" data-alias="aps" data-crid="3LY5DQXGQLBAV" data-isfb="false" data-issc="false" data-keyword="amazon gift cards" data-nid="" data-reftag="nb_sb_ss_i_1_1" data-store="" data-type="a9" id="issDiv0"><span class="s-heavy"></span>a<span class="s-heavy">mazon gift cards</span></div>
<div class="s-suggestion" data-alias="gift-cards" data-crid="3LY5DQXGQLBAV" data-isfb="false" data-issc="false" data-keyword="amazon gift cards" data-nid="" data-reftag="nb_sb_ss_c_2_1" data-store="Gift Cards" data-type="a9-xcat" id="issDiv1"> <span class="a-size-mini" style="padding-left: 16pt">in <span class="a-color-tertiary">Gift Cards</span></span></div>
<div class="s-suggestion" data-alias="aps" data-crid="3LY5DQXGQLBAV" data-isfb="false" data-issc="false" data-keyword="aa batteries" data-nid="" data-reftag="nb_sb_ss_i_3_1" data-store="" data-type="a9" id="issDiv2"><span class="s-heavy"></span>a<span class="s-heavy">a batteries</span></div>
<div class="s-suggestion" data-alias="aps" data-crid="3LY5DQXGQLBAV" data-isfb="false" data-issc="false" data-keyword="aaa batteries" data-nid="" data-reftag="nb_sb_ss_i_4_1" data-store="" data-type="a9" id="issDiv3"><span class="s-heavy"></span>a<span class="s-heavy">aa batteries</span></div>
<div class="s-suggestion" data-alias="aps" data-crid="3LY5DQXGQLBAV" data-isfb="false" data-issc="false" data-keyword="airpod case" data-nid="" data-reftag="nb_sb_ss_i_5_1" data-store="" data-type="a9" id="issDiv4"><span class="s-heavy"></span>a<span class="s-heavy">irpod case</span></div>
<div class="s-suggestion" data-alias="aps" data-crid="3LY5DQXGQLBAV" data-isfb="false" data-issc="false" data-keyword="apple watch band 38mm" data-nid="" data-reftag="nb_sb_ss_i_6_1" data-store="" data-type="a9" id="issDiv5"><span class="s-heavy"></span>a<span class="s-heavy">pple watch band 38mm</span></div>
<div class="s-suggestion" data-alias="aps" data-crid="3LY5DQXGQLBAV" data-isfb="false" data-issc="false" data-keyword="apple watch" data-nid="" data-reftag="nb_sb_ss_i_7_1" data-store="" data-type="a9" id="issDiv6"><span class="s-heavy"></span>a<span class="s-heavy">pple watch</span></div>
<div class="s-suggestion" data-alias="aps" data-crid="3LY5DQXGQLBAV" data-isfb="false" data-issc="false" data-keyword="airpods" data-nid="" data-reftag="nb_sb_ss_i_8_1" data-store="" data-type="a9" id="issDiv7"><span class="s-heavy"></span>a<span class="s-heavy">irpods</span></div>
<div class="s-suggestion" data-alias="aps" data-crid="3LY5DQXGQLBAV" data-isfb="false" data-issc="false" data-keyword="apple watch band 42mm" data-nid="" data-reftag="nb_sb_ss_i_9_1" data-store="" data-type="a9" id="issDiv8"><span class="s-heavy"></span>a<span class="s-heavy">pple watch band 42mm</span></div>
<div class="s-suggestion" data-alias="aps" data-crid="3LY5DQXGQLBAV" data-isfb="false" data-issc="false" data-keyword="alexa" data-nid="" data-reftag="nb_sb_ss_i_10_1" data-store="" data-type="a9" id="issDiv9"><span class="s-heavy"></span>a<span class="s-heavy">lexa</span></div>
<div class="s-suggestion" data-alias="aps" data-crid="3LY5DQXGQLBAV" data-isfb="false" data-issc="false" data-keyword="apple watch charger" data-nid="" data-reftag="nb_sb_ss_i_11_1" data-store="" data-type="a9" id="issDiv10"><span class="s-heavy"></span>a<span class="s-heavy">pple watch charger</span></div>
我也尝试将汤替换为:
keys = searchSoup.select('div[data-alias]="aps"')
但我收到此错误:
SyntaxError: Invlaid character '=' at position 15
我如何专门获取:data-alias="aps" div? 谢谢
【问题讨论】:
-
selec( 'div[data-alias="aps"]' )? -
browse.find_element_by_xpath( '//div[@data-alias="aps"]' )?
标签: python css web-scraping beautifulsoup