如何在 aspx 中通过 Beautifulsoup 下载文件？答案

【问题标题】：How to download file via Beautifulsoup in aspx?如何在 aspx 中通过 Beautifulsoup 下载文件？
【发布时间】：2021-10-10 18:52:19
【问题描述】：

我想使用漂亮的汤和硒在网站上下载 pdf 文件。

我已经把代码写到这里了，它是不完整的。但是，由于我找不到下载 pdf 文件的链接。

#!/usr/bin/python

from bs4 import BeautifulSoup
from selenium import webdriver
import webbrowser
import os
import requests
import urllib2
import time
import urllib
try:
  options = webdriver.ChromeOptions()
  options.add_argument("--headless")
  options.add_argument('--no-sandbox')
  driver = webdriver.Chrome("/usr/bin/chromedriver", chrome_options=options)
except urllib2.HTTPError as e:
    print(e)
except urllib2.URLError:
    print ("Server down or incorrect domains.")
else:
    def not_relative_uri(href):
          return re.compile('^https://').search(href) is not None

    driver.get("https://xxxxxx")
    # print(driver.page_source.encode('utf-8'))

    my_folder="/home/python/"
    soup_res = BeautifulSoup(driver.page_source.encode('utf-8'), 'html.parser')
    tr = soup_res.find("div", {"id":"pageWrapper"}).find("div", {"class":"EGZDefault-List"}).find("div", {"class":"EGZDefault-List-Info-List"}).find("table", {"class":"gridview"}).find("tbody").find_all('tr')[1:21]

我希望有人可以帮助我。

【问题讨论】：

那个网址打不开..试了好几次
@Prophet 路径访问成功。请帮我再试一次。

标签： python asp.net selenium-webdriver beautifulsoup

【解决方案1】：

使用 Selenium，您可以执行以下操作：

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
import time

driver = webdriver.Chrome("/usr/bin/chromedriver", chrome_options=options)

wait = WebDriverWait(driver, 20)
actions = ActionChains(driver)

driver.get("https://xxxxxx")

wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table.gridview input[type='image']")))
time.sleep(2)
images = driver.find_elements_by_css_selector("table.gridview input[type='image']")
for image in images:
    actions.move_to_element(image).perform()
    time.sleep(0.5)
    image.click()
    time.sleep(5)

【讨论】：

点击后的延迟是为了让图片下载或至少部分下载。一个接一个地点击多个下载可能会导致问题。
我看过下载的文件。但是，文件相互覆盖，名称与 new_announcement.pdf 相同，您能帮我以不同的名称保存吗？（根据订单号也可以）
下载的文件名由网站给出。我不知道是否可以更改此设置，稍后将尝试检查。也许在文件下载后重命名？
我也会想办法的。不过，我希望你能提供帮助。
非常抱歉打扰。你能帮我点击图片pdf时将它下载到指定的或随机的文件夹名称（从1-> 20）吗？