如果在列表的一个实例中缺少 xpath 时忽略它的条件答案

【问题标题】：If condition to ignore xpath when it is missing in one instance of the list如果在列表的一个实例中缺少 xpath 时忽略它的条件
【发布时间】：2021-09-10 02:20:07
【问题描述】：

我目前正在尝试使用这段代码来抓取 LinkedIn 工作页面：

# importing packages
import pandas as pd
import re

from bs4 import Tag, NavigableString, BeautifulSoup
from datetime import date, timedelta, datetime
from IPython.core.display import clear_output
from random import randint
from requests import get
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from time import sleep
from time import time
start_time = time()

from warnings import warn

# replace variables here.
url = "https://www.linkedin.com/jobs/search?keywords=&location=Egypt&geoId=&trk=public_jobs_jobs-search-bar_search-submit&position=1&pageNum=0&sortBy=DD"
no_of_jobs = 25

# this will open up new window with the url provided above 
driver = webdriver.Chrome()
driver.get(url)
sleep(3)
action = ActionChains(driver)


# to show more jobs. Depends on number of jobs selected
i = 2
while i <= (no_of_jobs/25): 
    driver.find_element_by_xpath('/html/body/main/div/section/button').click()
    i = i + 1
    sleep(5)

# parsing the visible webpage
pageSource = driver.page_source
lxml_soup = BeautifulSoup(pageSource, 'lxml')

# searching for all job containers
job_container = lxml_soup.find('ul', class_ = 'jobs-search__results-list')

print('You are scraping information about {} jobs.'.format(len(job_container)))


# setting up list for job information
job_id = []
post_title = []
company_name = []
post_date = []
job_location = []
job_desc = []
level = []
emp_type = []
functions = []
industries = []

# for loop for job title, company, id, location and date posted
for job in job_container:

    if not isinstance(job, Tag):
        continue
    # job title
    job_titles = job.find("h3", class_="base-search-card__title").text
    post_title.append(job_titles)
    
    # linkedin job id
    job_ids = job.find('a', href=True)['href']
    job_ids = re.findall(r'(?!-)([0-9]*)(?=\?)',job_ids)[0]
    job_id.append(job_ids)
    
    # company name
    company_names = job.select_one('img')['alt']
    company_name.append(company_names)
    
    # job location
    job_locations = job.find("span", class_="job-search-card__location").text
    job_location.append(job_locations)
    
    # posting date
    post_dates = job.select_one('time')['datetime']
    post_date.append(post_dates)

# for loop for job description and criterias
for x in range(1,no_of_jobs):
    
        
    # clicking on different job containers to view information about the job

    job_xpath = '/html/body/div[3]/div/main/section/ul/li[{}]'.format(x)
    driver.find_element_by_xpath(job_xpath).click()
    sleep(3)
    
    # job description
    jobdesc_xpath = '/html/body/div[3]/div/section/div[2]/section[2]/div'
    job_descs = driver.find_element_by_xpath(jobdesc_xpath).text
    job_desc.append(job_descs)
    
    # job criteria container below the description
    job_criteria_container = lxml_soup.find('ul', class_ = 'description__job-criteria-list')
    all_job_criterias = job_criteria_container.find_all("ul", class_='description__job-criteria-list')
    
    # Seniority level
    seniority_xpath = '/html/body/div[3]/div/section/div[2]/section[2]/ul/li[1]/span'
    seniority = driver.find_element_by_xpath(seniority_xpath).text
    level.append(seniority)
    
    # Employment type
    type_xpath = '/html/body/div[3]/div/section/div[2]/section[2]/ul/li[2]/span'
    employment_type = driver.find_element_by_xpath(type_xpath).text
    emp_type.append(employment_type)
    
    # No Applicants
    function_xpath = 'num-applicants__caption'
    No_Applicants = driver.find_element_by_class_name(function_xpath).text
    functions.append(No_Applicants)
    
    # Industries
    industry_xpath = '/html/body/div[3]/div/section/div[2]/section[2]/ul/li[4]/span'
    industry_type = driver.find_element_by_xpath(industry_xpath).text
    industries.append(industry_type)
    
    x = x+1

# to check if we have all information
print(len(job_id))
print(len(post_date))
print(len(company_name))
print(len(post_title))
print(len(job_location))
print(len(job_desc))
print(len(level))
print(len(emp_type))
print(len(functions))
print(len(industries))

我要抓取的网址是：

https://www.linkedin.com/jobs/search?keywords=&location=Egypt&geoId=&trk=public_jobs_jobs-search-bar_search-submit&position=1&pageNum=0&sortBy=DD

在我遍历工作条件的第二个 for 循环中，在 LinkedIn 上的某些工作中，他们没有输入就业类型或行业！当它在包含它们的列表项上循环时！它工作得很好！但是当它到达一个不包含该元素的列表项时，它会返回一个元素未找到错误！如果在列表项中找不到就业类型或行业类型，我该怎么写？忽略它们并继续下一个！

【问题讨论】：

标签： python selenium xpath

【解决方案1】：

我怎么写和如果条件说如果就业类型或列表项中未找到该行业类型！忽略它们和继续下一个

有几种方法可以做到这一点，但输出错误有助于诊断问题。由于您特别想要一种忽略异常的方法，请尝试使用 selenium 进行错误处理，因为我相信您从描述中得到了 NoSuchElementException：

from selenium.common.exceptions import NoSuchElementException
try:
    # line of code that is giving you an error, or the entire loop (not recommended).
except NoSuchElementException:
    pass # Or do something else useful like log the output.

您可以在 selenium here 中阅读有关此特定异常以及许多其他异常的更多信息。请注意，例如，如果此错误不是来自 selenium 的错误，您也可以捕获所有错误并使用 'except:'。

【讨论】：