【问题标题】:How to find element from p tag如何从 p 标签中找到元素
【发布时间】:2021-11-24 12:25:14
【问题描述】:

我正在尝试从 p 标签中获取文本首先转到页面然后选择语言english 并单击advanced search 然后他们会向您显示结果这些是页面链接https://www.counselingcalifornia.com/Find-a-Therapist 我正在尝试提取p标签的结果,他们会告诉我错误

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.select import Select
from selenium import webdriver
driver = webdriver.Chrome('C:\Program Files (x86)\chromedriver.exe')
driver.maximize_window()

wait = WebDriverWait(driver, 30)

driver.get("https://www.counselingcalifornia.com/Find-a-Therapist")

wait.until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR, "iframe[id$='IFrame_htmIFrame']")))
select = Select(wait.until(EC.visibility_of_element_located((By.ID, "language_field"))))
select.select_by_value('ENG')

wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a#searchBtn"))).click()
dunk=driver.find_elements(By.XPATH, "//div[@class='row']")
for dun in dunk:
    phone=dun.find_element_by_xpath("//p[@id='phoneDiv_80863']/i[@class='fa fa-phone-square']").get_text()
    print(phone)

【问题讨论】:

  • 您只需从 xpath 中删除 "/i[@class='fa fa-phone-square']" 部分。当您的定位器指向电话图标时,您的文本位于 //p 标签中。这就是为什么没有文本来的原因。其次,点击后,只需给足够的时间加载结果即可。

标签: python selenium web-scraping


【解决方案1】:

这里不需要硒。直接从url获取请求即可。 BeautifulSoup 可以获取<p> 标签,其id 属性以"phoneDiv" 开头。您还可以更改查询以返回超过 25 个。所以这里是 1-100 的列表

import requests
from bs4 import BeautifulSoup
import re

limit = 100

url = f'https://www.counselingcalifornia.com/cc/cgi-bin/utilities.dll/customlist?FIRSTNAME=~&LASTNAME=~&ZIP=&DONORCLASSSTT=&_MULTIPLE_INSURANCE=&HASPHOTOFLG=&_MULTIPLE_EMPHASIS=&ETHNIC=&_MULTIPLE_LANGUAGE=ENG&QNAME=THERAPISTLIST&WMT=NONE&WNR=NONE&WHP=therapistHeader.htm&WBP=therapistList.htm&RANGE=1%2F{limit}&SORT=LASTNAME'
headers = {'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Mobile Safari/537.36'}

response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
rows = soup.find_all('div', {'class':'row'})

for row in rows:
    p = row.find('p', {'id':re.compile('^phoneDiv')})
    print(p.text)

【讨论】:

  • 你告诉我你是如何生成这些url的
  • 这些方法是超级错误,但如果你能告诉我如何生成url,我将不胜感激
  • 转到开发工具 -> 网络 -> Fetch/XHR。您将在 Headers 选项卡中看到请求 url。
猜你喜欢
  • 1970-01-01
  • 2022-11-14
  • 1970-01-01
  • 2019-02-25
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多