【问题标题】:Is there a way to make this python selenium code work in headless mode?有没有办法让这个 python selenium 代码在无头模式下工作?
【发布时间】:2020-01-10 12:58:37
【问题描述】:

所以我之前已经问过这个问题 (Unable to get selenium (python) to download a csv file which doesnt have a link but only appears after i click the download button) 并设法做到了这一点。我终于意识到代码不起作用,因为它处于无头模式。

在我之前的帖子中,我还提到我会尝试使用请求来获取文件,但在这种情况下似乎没有指向 csv 文件的链接。

代码基本上在这里https://www.macrotrends.net/1476/copper-prices-historical-chart-data,单击“所有年份”按钮,然后单击“下载历史数据”按钮。 selenium 会在点击后尝试保存文件。

但是就像我说的那样,它仅在我处于正常模式时才下载文件,它似乎无法在无头模式下工作。是否有一个原因?有没有办法让它在无头模式下工作?我一直在四处寻找,但我找不到答案。


from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options

start_time = time.time()

options = Options()

#options.add_argument("--headless")
options.add_argument("--disable-gpu")
options.add_argument("--disable-extensions")

options.add_experimental_option("prefs", {
  "download.default_directory": r"'/home/Documents/testing/macrotrends'",
  "download.prompt_for_download": False,
  "download.directory_upgrade": True,
  "safebrowsing.enabled": False
})

driver = webdriver.Chrome(executable_path=r'/home/chromedriver/chromedriver',options=options)


driver.get('https://www.macrotrends.net/1476/copper-prices-historical-chart-data')

time.sleep(5)
iframe = driver.find_element_by_xpath("//iframe[@id='chart_iframe']")
driver.switch_to.frame(iframe)
xpath = "//a[text()='All Years']"
driver.find_element_by_xpath(xpath).click()
xpath = "//button[@id='dataDownload']"
driver.find_element_by_xpath(xpath).click()
time.sleep(10)

driver.close()

print("--- %s seconds ---" % (time.time() - start_time))

screenshot of the website in chrome

【问题讨论】:

  • 经过一番检查,我发现csv文件没有保存在某个地方,它是由JS制作和导出的。您可以尝试使用 Firefox 无头。它可能会有所帮助
  • 我刚刚尝试在 Chrome 中打开此网页(手动,不使用 selenium),但下载对我来说根本不起作用...控制台出现错误,“未捕获的类型错误:无法读取 HTMLButtonElement.document.getElementById.onclick 处未定义的属性“目标”。我正在使用 Chrome 77(Linux 上的测试版)。
  • 我在我的帖子中添加了一个网站的chrome截图,你可以看看。我在 linux 上使用版本 76.0.3809.132(官方构建)(64 位)。以及该版本的最新 chromedriver
  • 使用 Google“headless chrome 无法下载”发现问题 How to get Chrome headless to download files。 Chrome 似乎有(或有)用 headlees 下载的问题。
  • 我在 Linux Mint 上测试了 Chrome 和 Firefox,都没有在无头模式下下载。

标签: python selenium headless


【解决方案1】:

您可以使用模块pyvirtualdisplay创建虚拟显示,ChromeFirefox(不带headless)将自动使用它,它会隐藏窗口。

铬:

from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
import time

from pyvirtualdisplay import Display

display = Display(visible=0, size=(1920,1080))
display.start()

start_time = time.time()

options = Options()

###options.add_argument("--headless")
options.add_argument("--disable-gpu")
options.add_argument("--disable-extensions")

options.add_experimental_option("prefs", {
  "download.default_directory": "/home/Documents/testing/macrotrends", # without `r` and `' '`, only `" "`
  "download.prompt_for_download": False,
  "download.directory_upgrade": True,
  "safebrowsing.enabled": False
})

driver = webdriver.Chrome(executable_path=r'/home/chromedriver/chromedriver',options=options)
#driver = webdriver.Chrome(options=options) # I have chromedriver's folder in PATH so I don't have to use `executable_path`

driver.get('https://www.macrotrends.net/1476/copper-prices-historical-chart-data')
print('[INFO] loaded', time.time() - start_time)
time.sleep(5)

iframe = driver.find_element_by_xpath("//iframe[@id='chart_iframe']")
driver.switch_to.frame(iframe)
print('[INFO] switched', time.time() - start_time)

xpath = "//a[text()='All Years']"
driver.find_element_by_xpath(xpath).click()
xpath = "//button[@id='dataDownload']"
driver.find_element_by_xpath(xpath).click()
print('[INFO] clicked', time.time() - start_time)
time.sleep(10)

print('[INFO] closing', time.time() - start_time)
driver.close()
display.stop()
print('[INFO] end', time.time() - start_time)

火狐:

from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.firefox.options import Options
import time

from pyvirtualdisplay import Display

display = Display(visible=0, size=(1920,1080))
display.start()

start_time = time.time()

options = Options()

###options.add_argument("--headless")
options.add_argument("--disable-gpu")
options.add_argument("--disable-extensions")

options.set_preference("browser.download.folderList", 2)
options.set_preference("browser.download.dir", "/home/Documents/testing/macrotrends") # without `r` and `' '`, only `" "` 
options.set_preference("browser.download.useDownloadDir", True)
options.set_preference("browser.helperApps.neverAsk.saveToDisk", "text/csv")

driver = webdriver.Firefox(executable_path="...", options=options)
#driver = webdriver.Firefox(options=options) # I have geckondriver's folder in PATH so I don't have to use `executable_path`

driver.get('https://www.macrotrends.net/1476/copper-prices-historical-chart-data')
print('[INFO] loaded', time.time() - start_time)
time.sleep(5)

iframe = driver.find_element_by_xpath("//iframe[@id='chart_iframe']")
driver.switch_to.frame(iframe)
print('[INFO] switched', time.time() - start_time)

xpath = "//a[text()='All Years']"
driver.find_element_by_xpath(xpath).click()
xpath = "//button[@id='dataDownload']"
driver.find_element_by_xpath(xpath).click()
print('[INFO] clicked', time.time() - start_time)
time.sleep(10)

print('[INFO] closing', time.time() - start_time)
driver.close()
display.stop()

print('[INFO] end', time.time() - start_time)

【讨论】:

  • 嘿,谢谢这个方法,我以前从未使用过 pyvirtualdisplay 会尝试一下
【解决方案2】:

在无头模式下默认禁用下载。您可以通过执行以下开发人员工具命令来允许它们:

from selenium.webdriver import Chrome
from selenium.webdriver.chrome.options import Options

options = Options()
options.headless = True 
driver = Chrome(options=options)
params = {'behavior': 'allow', 'downloadPath': '/path/for/download'}
driver.execute_cdp_cmd('Page.setDownloadBehavior', params)
# downloads are now enabled for this driver instance

【讨论】:

    猜你喜欢
    • 2022-01-09
    • 2015-09-20
    • 1970-01-01
    • 2022-12-18
    • 1970-01-01
    • 2019-04-03
    • 2019-12-11
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多