【问题标题】:Selenium - getting audio src from html using pythonSelenium - 使用 python 从 html 获取音频 src
【发布时间】:2020-08-07 15:15:30
【问题描述】:

我正在尝试使用 selenium 从 recaptcha 音频源获取特定属性。

但是,我不确定该怎么做。这是来自https://www.google.com/recaptcha/api2/demo的示例

  1. 点击“我不是机器人”

  2. 选择耳机

  3. 提取src链接

    <audio id="audio-source" src="https://www.google.com:443/recaptcha/api2/payload?p=06AGdBq278w_OvG1dn_-_sgoVrqxLWcBq0IBkj2htJcsS-iTT3HtmwlhcTfBrcbQelxGI0hiep-082RypK_wZUTE-XzVbmcJ8zANM9l5O_0ka3x_7E_Hf_-vGqcRHCdRO7w2krqcgZDJSu1wj5wVyWhbDGITl55YsOs21NoX4aHk38173DPPu-Kj6T3mnqnA_3rMsdTkOUtMyl&amp;k=6Le-wvkSAAAAAPBMRTvw0Q4Muexq9bi0DJwx_mJ-" style="display: none"></audio>
    

我想检索 src 链接并打印出来

我可以知道是否有任何方法可以使用 selenium 来做到这一点?


到目前为止,我的代码允许我加载到 recaptcha 演示页面 -> 点击我不是机器人 -> 点击音频按钮

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

import time

PATH="C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get("http://localhost/recaptcha-v2/")
# driver.get("https://www.google.com/recaptcha/api2/demo")


WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe[src^='https://www.google.com/recaptcha/api2/anchor']")))
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "span#recaptcha-anchor"))).click()
driver.switch_to.default_content()
WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe[title='recaptcha challenge']")))
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#recaptcha-audio-button"))).click()


#This works, i can get the captcha token
# Src_URL = driver.find_element_by_id('recaptcha-token').get_attribute('value')

#This does not work, it can't locate the src
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID, "audio-source")))
Src_URL = driver.find_element_by_id('audio-source').get_attribute('src')
print(Src_URL)

请指教谢谢!

【问题讨论】:

标签: python selenium


【解决方案1】:

我使用下面的代码从audio 标记中提取src 属性(如果audio 标记在另一个iframe 中,则更改iframe)-

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver import ActionChains
import time

driver = webdriver.Chrome()
wait = WebDriverWait(driver, 5)
action = ActionChains(driver)

driver.get("YourURL")

# you can also use time.sleep(5)
wait.until(expected_conditions.presence_of_element_located((By.ID, "audio-source")))
Src_URL = driver.find_element_by_id('audio-source').get_attribute('src')

print(Src_URL)

【讨论】:

  • 嗨我收到这个没有这样的元素异常:消息:没有这样的元素:无法找到元素:{“method”:“css selector”,“selector”:“[id =”audio-来源"]"}
  • 可以提供网址吗?此外,您还可以添加一些等待,以便它有时间加载该特定元素。我已经更新了我的代码。请试一试。
猜你喜欢
  • 1970-01-01
  • 2015-10-28
  • 2022-08-07
  • 1970-01-01
  • 2020-03-24
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2019-06-06
相关资源
最近更新 更多