【问题标题】:Python, Selenium, Firefox: Force PDF DownloadPython、Selenium、Firefox:强制 PDF 下载
【发布时间】:2020-09-07 08:18:10
【问题描述】:

示例:https://apps1.lavote.net/camp/comm.cfm?&cid=82

使用 Selenium,我单击第一个 497 表格。在我的浏览器中,将打开一个新的 pdf 选项卡。在 selenium 中,似乎什么都没有发生。

这是我的代码,有些部分已编辑。

def scrape(session_key=None):

    options = Options()
    options.headless = True

    profile = webdriver.FirefoxProfile()
    profile.set_preference("browser.download.dir", os.path.join(base_dir, 'reports'))
    profile.set_preference("browser.download.folderList", 2)
    profile.set_preference("browser.helperApps.alwaysAsk.force", False);
    profile.set_preference("browser.download.manager.showAlertOnComplete", False)
    profile.set_preference("browser.download.manager.showWhenStarting", False);
    profile.set_preference('browser.helperApps.neverAsk.saveToDisk','application/zip,application/octet-stream,application/x-zip-compressed,multipart/x-zip,application/x-rar-compressed, application/octet-stream,application/msword,application/vnd.ms-word.document.macroEnabled.12,application/vnd.openxmlformats-officedocument.wordprocessingml.document,application/vnd.ms-excel,application/vnd.openxmlformats-officedocument.spreadsheetml.sheet,application/vnd.openxmlformats-officedocument.spreadsheetml.sheet,application/vnd.openxmlformats-officedocument.wordprocessingml.document,application/vnd.openxmlformats-officedocument.spreadsheetml.sheet,application/rtf,application/vnd.openxmlformats-officedocument.spreadsheetml.sheet,application/vnd.ms-excel,application/vnd.ms-word.document.macroEnabled.12,application/vnd.openxmlformats-officedocument.wordprocessingml.document,application/xls,application/msword,text/csv,application/vnd.ms-excel.sheet.binary.macroEnabled.12,text/plain,text/csv/xls/xlsb,application/csv,application/download,application/vnd.openxmlformats-officedocument.presentationml.presentation,application/octet-stream')

    profile.set_preference("pdfjs.disabled", True)
    profile.set_preference("plugin.disable_full_page_plugin_for_types", "application/pdf")

    driver = webdriver.Firefox(firefox_profile=profile, options=options)

    driver.get(magic_url)

    committee_table = driver.find_elements_by_css_selector('table')[2]
    links = [link.get_attribute('href') for link in committee_table.find_elements_by_tag_name('a')]
    
    driver.get('https://apps1.lavote.net/camp/comm.cfm?&cid=82')
    forms_table = driver.find_elements_by_css_selector('table')[1]
    forms_table_trs = forms_table.find_elements_by_css_selector('tr')
    for i, row in enumerate(forms_table_trs):
        if i > 0:
            cells = row.find_elements_by_css_selector('td')
            print(1)
            try:
                link = cells[2].find_elements_by_tag_name('a')[0]
                
                link.click()
                pdfs = glob.glob(os.path.join(base_dir, 'scraper/*.pdf'))
                latest_pdf_file = max(pdfs, key=os.path.getctime)
                
                parse_funcs[form_type](latest_pdf_file)
                
                except Exception as e:
                    print(e)

您可能已经猜到,没有 pdf。它们没有被下载。这就是我在这里的原因。我该怎么做?

【问题讨论】:

    标签: python selenium firefox


    【解决方案1】:

    如果您只需要文件而不是测试实际的浏览器对话例程,请使用 Python 获取文件,而不是要求 Selenium 执行此操作。

    从页面中获取 PDF URL,然后使用 request 将文件下载到您的内存,然后 open().write() 将其保存到文件系统。

    req = requests.get(url, allow_redirects=True)
    open(filename, 'wb').write(r.content)
    

    你也可以从 r 中获取文件名,但是有点麻烦。在这里查看:https://www.codementor.io/@aviaryan/downloading-files-from-urls-in-python-77q3bs0un

    【讨论】:

      猜你喜欢
      • 2020-08-04
      • 1970-01-01
      • 2018-03-08
      • 2017-04-11
      • 2017-03-09
      • 1970-01-01
      • 2018-02-12
      • 1970-01-01
      • 2013-11-28
      相关资源
      最近更新 更多