【问题标题】:How to extract information from page如何从页面中提取信息
【发布时间】:2021-10-11 13:23:57
【问题描述】:

我正在尝试从此页面中提取 namephone number

from selenium import webdriver
# location of chromedriver.exe
browser = webdriver.Chrome("C:\Program Files (x86)\chromedriver.exe")

browser.get("https://www.houzz.com/professionals/general-contractor")

for title in browser.find_elements_by_xpath('//span[@class="mlm header-5 text-unbold"]'):
    title.click()
    name=browser.find_elements_by_xpath('//h1[@class="mwxddt-0 jIujVr"]')
    print(name)

【问题讨论】:

  • 您遇到了什么问题?
  • 我正在尝试提取他们显示这些错误的姓名和电话号码Message: stale element reference: element is not attached to the page document
  • 总是将完整的错误消息(从单词“Traceback”开始)作为文本(不是截图,不是链接到外部门户)有问题(不是评论)。还有其他有用的信息。

标签: python selenium web-scraping


【解决方案1】:

对于这种情况,您应该有一个循环,按索引查找名称,然后每次迭代将索引增加 1。

此外,您应该滚动到每个元素,让 selenium 知道元素在它们的视口中。

代码:

browser = webdriver.Chrome("C:\Program Files (x86)\chromedriver.exe")

browser.maximize_window()
browser.implicitly_wait(30)

browser.get("https://www.houzz.com/professionals/general-contractor")
size = browser.find_elements(By.XPATH, "//span[@itemprop='name']")
j = 1
for i in range(len(size)):
    element =  browser.find_element(By.XPATH, f"(//span[@itemprop='name'])[{j}]")
    browser.execute_script("arguments[0].scrollIntoView(true);", element)
    print(element.text)
    j = j +1

输出:

Capital Remodeling
SOD Home Group
Innovative Construction Inc.
Baron Construction & Remodeling Co.
Luxe Remodel
California Home Builders & Remodeling Inc.
Sneller Custom Homes and Remodeling, LLC
123 Remodeling Inc.
Professional builders & Remodeling, Inc
Rudloff Custom Builders
LAR Construction & Remodeling
Erie Construction Mid West
Regal Construction & Remodeling Inc.
Mr. & Mrs. Construction & Remodeling
Bailey Remodeling and Construction LLC

更新 1:

browser= webdriver.Chrome(driver_path)
browser.maximize_window()
browser.implicitly_wait(30)
wait = WebDriverWait(browser, 30)
browser.get("https://www.houzz.com/professionals/general-contractor")
size = browser.find_elements(By.XPATH, "//span[@itemprop='name']")
j = 1
for i in range(len(size)):
    element =  browser.find_element(By.XPATH, f"(//span[@itemprop='name'])[{j}]")
    browser.execute_script("arguments[0].scrollIntoView(true);", element)
    print(element.text)
    browser.execute_script("arguments[0].click();", element)
    wait.until(EC.element_to_be_clickable((By.XPATH, "//button[@data-component='Pro Phone Link']"))).click()
    print(wait.until(EC.visibility_of_element_located((By.XPATH, "//a[@data-component='Call Pro']"))).text)
    #wait.until(EC.element_to_be_clickable((By.LINK_TEXT, "Website"))).click()
    browser.execute_script("window.history.go(-1)")
    j = j + 1

进口:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

【讨论】:

  • 谢谢它也对我有用,但我希望他 click 每次倾斜并进入每个产品,然后收集标题
  • 点击第一页的标题后会跳转到新页面,你想从新页面抓取哪些数据?
  • 收集titlephone号码和website
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2014-05-08
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2023-02-10
  • 2014-11-23
  • 1970-01-01
相关资源
最近更新 更多