【发布时间】:2021-07-28 09:18:31
【问题描述】:
我目前正在遍历所有标签并从每个页面中提取数据,但是我无法提取每个类别下方突出显示的文本(即 Founded、Location 等)。文本似乎在 " " 和 br 标签之上,谁能告诉我如何提取?
网站 - https://labelsbase.net/knee-deep-in-sound
<div class="line-title-block">
<div class="line-title-wrap">
<span class="line-title-text">Founded</span>
</div>
</div>
2003
<br>
<div class="line-title-block">
<div class="line-title-wrap">
<span class="line-title-text">Location</span>
</div>
</div>
<a href="/?c=United+Kingdom">United Kingdom</a>
<br>
我尝试过使用driver.find_elements_by_xpath & driver.execute_script,但找不到解决方案。
错误信息 -
Message: invalid selector: The result of the xpath expression "/html/body/div[3]/div/div[1]/div[2]/div/div[1]/text()[2]" is: [object Text]. It should be an element.
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, NoSuchElementException
import pandas as pd
import time
import string
PATH = '/Applications/chromedriver'
driver = webdriver.Chrome(PATH)
wait = WebDriverWait(driver, 10)
links = []
url = 'https://labelsbase.net/knee-deep-in-sound'
driver.get(url)
time.sleep(5)
# -- Title
title = driver.find_element_by_class_name('label-name').text
print(title,'\n')
# -- Image
image = driver.find_element_by_tag_name('img')
src = image.get_attribute('src')
print(src,'\n')
# -- Founded
founded = driver.find_element_by_xpath("/html/body/div[3]/div/div[1]/div[2]/div/div[1]/text()[2]").text
print(founded,'\n')
driver.quit()
【问题讨论】:
标签: python selenium web-scraping