【问题标题】:Extract text inside span without class name using BeautifulSoup使用 BeautifulSoup 提取跨度内没有类名的文本
【发布时间】:2021-10-26 16:07:40
【问题描述】:

我正在尝试使用 BeautifulSoup 从求职网站中提取数据。我已经能够提取我需要的所有数据,但显示的薪水。

网页是https://mx.indeed.com/jobs?q=operador&l=Ciudad%20de%20M%C3%A9xico

我遇到的问题是薪水在<span> 内,没有班级名称或职位。

示例 html 代码如下所示:

<div class="heading6 tapItem-gutter metadataContainer"><div class="metadata salary-snippet-container"><div aria-label="$12,000 al mes" class="salary-snippet"><span>$12,000 al mes</span></div></div></div>

我试过了:

salary = card.find("div", {"class" : "salary-snippet"}).find("span").text

但我收到以下错误:

AttributeError: 'NoneType' object has no attribute 'find'

谁能解释一下我该如何解决这个问题?

【问题讨论】:

    标签: python html web-scraping beautifulsoup


    【解决方案1】:

    会发生什么?

    这个样本看起来很完美,但如果你仔细观察,并不是所有卡片中都包含薪水元素。

    如何解决?

    只需检查元素是否存在,然后在其上调用文本:

    salary = card.select_one('div.salary-snippet').text if card.select_one('div.salary-snippet') else None
    

    示例

    import requests
    from bs4 import BeautifulSoup
    
    headers ={
        'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'
    }
    
    r =requests.get('https://mx.indeed.com/trabajo?q=operador&l=Ciudad%20de%20M%C3%A9xico&vjk=970d586d3023d4d0')
    soup=BeautifulSoup(r.content, 'lxml')
    
    data = []
    
    for card in soup.select('#mosaic-provider-jobcards a'):
        companyName = card.select_one('span.companyName').text if card.select_one('span.companyName') else None
        companyLocation = card.select_one('div.companyLocation').text if card.select_one('div.companyLocation') else None
        salary = card.select_one('div.salary-snippet').text if card.select_one('div.salary-snippet') else None
        
        data.append({
            'companyName':companyName,
            'companyLocation':companyLocation,
            'salary':salary
        })
    
    data
    

    只想添加带薪工作?

    data = []
    
    for card in soup.select('#mosaic-provider-jobcards a'):
        companyName = card.select_one('span.companyName').text if card.select_one('span.companyName') else None
        companyLocation = card.select_one('div.companyLocation').text if card.select_one('div.companyLocation') else None
        salary = card.select_one('div.salary-snippet').text if card.select_one('div.salary-snippet') else None
        
        if salary:
            data.append({
                'companyName':companyName,
                'companyLocation':companyLocation,
                'salary':salary
            })
    
    data
    

    【讨论】:

      猜你喜欢
      • 2021-10-05
      • 2019-11-07
      • 1970-01-01
      • 2020-02-18
      • 2018-01-24
      • 1970-01-01
      • 1970-01-01
      • 2022-06-27
      • 2019-03-09
      相关资源
      最近更新 更多