【发布时间】:2021-09-01 19:53:10
【问题描述】:
我的代码从网页中收集了一堆 URL,然后将它们放入一个列表中。
一旦进入一个列表,它就会一个一个地进入每个列表,然后执行一次抓取。
但是,一些网页一旦被访问,就会有一个空白页面,这会阻止代码执行 URL 的其余部分。
如何在我的代码中添加一个异常,如果发生这种情况,我可以绕过网页并继续进入下一个 URL?
from selenium import webdriver
import time
from bs4 import BeautifulSoup
driver = webdriver.Chrome()
import pandas as pd
import requests
dataf=[]
val=[]
baseurl='https://careers.abbvie.com/'
endurl='?lang=en-us&previousLocale=en-US'
for x in range(1,89):
driver.get(f'https://careers.abbvie.com/abbvie/jobs?page={x}&categories=Administrative%20Services%7CBusiness%20Development%7CGeneral%20Management%7CHEOR%2FMarket%20Access%7CInformation%20Technology%7CMarketing%7CMedical%7CRegulatory%20Affairs%7CSales%7CSales%20Support')
time.sleep(7)
page_source = driver.page_source
soup = BeautifulSoup(page_source, 'html.parser')
eachRow = soup.find_all('p', class_='job-title')
for link in eachRow:
for links in link.find_all('a',href=True):
val.append(baseurl+links['href']+endurl)
for b in val:
try:
driver.get(b)
time.sleep(3)
page_source = driver.page_source
title=driver.find_element_by_xpath('//*[@id="jibe-container"]/div[2]/div/div/h1').text
location=driver.find_element_by_xpath('//*[@id="header-locations"]/span').text
categories=driver.find_element_by_xpath('//*[@id="header-categories"]/span').text
jobID=driver.find_element_by_xpath('//*[@id="header-req_id"]/span').text
dict={"Title":title,"location":location,"categories":categories,"jobID":jobID,"URL":b}
dataf.append(dict)
except:
print("hello")
df=pd.DataFrame(dataf)
df.to_csv('restasis.csv')
【问题讨论】:
-
您的问题现在包含
tryexcept。这是否意味着您知道如何处理异常?那么,关于异常处理,您到底想问什么?