【发布时间】:2020-12-14 15:05:47
【问题描述】:
我想抓取一些开放数据,但我不断收到元素没有文本属性值的错误。在我在谷歌上测试之前,我可以通过这种方式轻松地在课堂上获取文本。我也用 xpath 测试它,我知道“nobr”标签在 html 中是多个标签,这就是问题所在,但通常可以用 xpath 跳转
driver.get('https://www.gelbeseiten.de/Suche/dm-drogerie%20markt/Bundesweit')
time.sleep(3)
plz = driver.find_element_by_class_name("nobr").text
plzx = driver.find_element_by_xpath("/html/body/div[2]/div[2]/div/div/div[1]/div/div/div/div[2]/div/article[59]/a/address/p[1]/span").text
print(plzx)
唯一的问题是用元素定位文本???我想遍历页面元素以用它填充 pd 中的这些列
#this was working if I scrape just by one value through page
product_titles = driver.find_elements_by_class_name('nobr')
for title in product_titles:
print(title.text)
#i want to save the scraped data later
df = pd.DataFrame([[name,plz.street,city,number]],columns['business','plz','street','city','number'])
这是我要抓取的信息的部分
<a href="https://www.gelbeseiten.de/gsbiz/cf5182f8-e6ba-4846-a1f2-0d179feb68c4" data-realid="cf5182f8-e6ba-4846-a1f2-0d179feb68c4" data-tnid="162004776014" target="_self">
<div class="mod-hervorhebung">
</div>
<h2 data-wipe-name="Titel">dm-drogerie markt GmbH + Co. KG</h2>
<p class="d-inline-block mod-Treffer--besteBranche">
Drogeriewaren
</p>
<div class="bewertungen-anker">
<div class="mod mod-Stars mod-Stars--" title="5.0/5" data-float="5,0">
<span class="mod-Stars__text" style="width: 100%;">5.0</span>
</div>
<span>5.0</span>
<span>(2)</span>
</div>
<address class="mod mod-AdresseKompakt">
<p data-wipe-name="Adresse">
Geisenheimer Str. 70,
<span class="nobr">
65385
Rüdesheim am Rhein
</span>
</p>
<p class="mod-AdresseKompakt__phoneNumber" data-hochgestellt-position="end" data-wipe-name="Kontaktdaten">06722 40 63 70</p>
</address>
<div class="oeffnungszeit_kompakt__zustandsinfo--geoeffnet">
<span>Geöffnet</span>,
<span class="nobr">schließt um 20:00</span>
</div>
</a>
<div class="mod mod-Aktionsleistekompakt">
<div class="mod mod-gsSlider mod-gsSlider--noneOnWhite">
<span class="mod-gsSlider__arrow mod-gsSlider__arrow--arrow" data-direction="left" data-show="false" data-wipe="{"listener":"click","name":"Trefferliste: Slider-Pfeil-links"}"></span>
<span class="mod-gsSlider__arrow mod-gsSlider__arrow--arrow" data-direction="right" data-show="false" data-wipe="{"listener":"click","name":"Trefferliste: Slider-Pfeil-rechts"}"></span>
<div class="mod-gsSlider__slider">
<span class="contains-icon-route gs-btn" data-wipe="{"listener":"click", "name":"Trefferliste Navigation-Button", "id":"162004776014"}" data-parameters="{"partner": "googlemaps", "searchquery": "Geisenheimer%20Str%2070%2065385%20R%C3%BCdesheim%20am%20Rhein"}" data-target="_blank">Route</span>
<a class="contains-icon-details gs-btn" rel="noopener" href="https://www.gelbeseiten.de/gsbiz/cf5182f8-e6ba-4846-a1f2-0d179feb68c4" data-wipe="{"listener": "mouseup", "name": "Trefferliste Actionbutton Mehr Details", "id": "162004776014", "synchron": false}" data-isneededpromise="false" data-cookieinfo="cf5182f8-e6ba-4846-a1f2-0d179feb68c4=162004776014">Mehr Details</a>
<div class="mod-gsSlider__spacer"></div>
</div>
</div>
</div>
【问题讨论】:
标签: python html python-3.x selenium-webdriver