【发布时间】:2016-05-30 08:28:39
【问题描述】:
我正在尝试在this web page 上下载所有无需登录或订阅即可下载的 PDF 文件,但出现此错误。
[Errno 10054] An existing connection was forcibly closed by the remote host
我该如何解决这个错误?
# -*- coding: utf-8 -*-
from selenium import webdriver
from bs4 import BeautifulSoup
import urllib2 as ul
def download_pdf(file_name, download_url):
response = ul.urlopen(download_url)
file = open(file_name + ".pdf", 'wb')
file.write(response.read())
file.close()
print("Completed")
chrome_path = r"C:\Users\HarutakaKawamura\Desktop\bs\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get('https://www.osapublishing.org/search.cfm?q=comsol&meta=1&cj=1&cc=1')
driver.implicitly_wait(10)
links = driver.find_elements_by_xpath("//a[contains(text(), 'PDF')]")
titles = driver.find_elements_by_xpath("//h3[contains(@class, 'sri-title')]")
for i in range(len(links)):
href = links[i].get_attribute("href")
bs = BeautifulSoup(ul.urlopen(href), 'lxml')
if len(str(bs)) < 1000:
download_url = bs.findAll("frame")[1]['src']
file_name = titles[i].find_element_by_tag_name("a").text
download_pdf(file_name, download_url)
【问题讨论】:
-
您是否设法下载了一些 PDF 或者您在下载之前遇到了错误?
-
随机的。有时我会在下载某些 PDF 后出现错误,有时在下载之前会出现错误。
-
好的,所以尝试捕获错误并再次发送您的请求
-
成功了!谢谢!
标签: python python-2.7 selenium web-scraping beautifulsoup