【问题标题】:Python Selenium - Extract text within <br>Python Selenium - 在 <br> 中提取文本
【发布时间】:2021-07-28 09:18:31
【问题描述】:

我目前正在遍历所有标签并从每个页面中提取数据,但是我无法提取每个类别下方突出显示的文本(即 Founded、Location 等)。文本似乎在 " "br 标签之上,谁能告诉我如何提取?

网站 - https://labelsbase.net/knee-deep-in-sound

                        <div class="line-title-block">
                            <div class="line-title-wrap">
                                <span class="line-title-text">Founded</span>
                            </div>
                        </div>
                        2003
                        <br>


                        <div class="line-title-block">
                            <div class="line-title-wrap">
                                <span class="line-title-text">Location</span>
                            </div>
                        </div>

                        
                        <a href="/?c=United+Kingdom">United Kingdom</a>
                        <br>

我尝试过使用driver.find_elements_by_xpath & driver.execute_script,但找不到解决方案。

错误信息 -

Message: invalid selector: The result of the xpath expression "/html/body/div[3]/div/div[1]/div[2]/div/div[1]/text()[2]" is: [object Text]. It should be an element.

Screenshot

import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By 
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, NoSuchElementException
import pandas as pd
import time
import string

PATH = '/Applications/chromedriver'
driver = webdriver.Chrome(PATH)
wait = WebDriverWait(driver, 10)
links = []
url = 'https://labelsbase.net/knee-deep-in-sound'
driver.get(url)

time.sleep(5)
# -- Title
title = driver.find_element_by_class_name('label-name').text
print(title,'\n')

# -- Image
image = driver.find_element_by_tag_name('img')
src = image.get_attribute('src')
print(src,'\n')

# -- Founded
founded = driver.find_element_by_xpath("/html/body/div[3]/div/div[1]/div[2]/div/div[1]/text()[2]").text
print(founded,'\n')

driver.quit()

【问题讨论】:

    标签: python selenium web-scraping


    【解决方案1】:

    你能检查一下吗

    founded = driver.find_element_by_xpath("//*[@*='block-content']").get_attribute("innerText")
    

    你可以取class="block-content"的XPath

    O/P

    【讨论】:

    • 谢谢,这是可行的,但是有没有办法可以单独存储每个值?因为我的目标是存储每个值以供以后调用。即我正在遍历每个唱片公司,并希望保存每个唱片公司的位置、Soundcloud 追随者、流派、图像、名称等...
    猜你喜欢
    • 2022-12-02
    • 1970-01-01
    • 1970-01-01
    • 2023-03-16
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多