无法从 Ebay Scraper 教程中提取数据 - 获取 AttributeError答案

【问题标题】：Unable to Pull Data from Ebay Scraper Tutorial - Getting AttributeError无法从 Ebay Scraper 教程中提取数据 - 获取 AttributeError
【发布时间】：2022-01-17 09:32:33
【问题描述】：

我目前正在学习通过以下链接创建 Ebay Scraper 的教程：

https://www.youtube.com/watch?v=csj1RoLTMIA&t=290s

我正在编写代码，突然注意到我的代码有两点：

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://www.ebay.co.uk/sch/i.html?_from=R40&_nkw=gaggia+classic&_sacat=0&rt=nc LH_Complete=1'

def get_data(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'html.parser')
    return soup

def parse(soup):
    results = soup.find_all('div', {'class': 's-item__info clearfix'})
    print(len(results))
    return

soup = get_data(url)
parse(soup)

上面的代码给了我 0 个结果，而：

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://www.ebay.co.uk/sch/i.html?_from=R40&_trksid=p2380057.m570.l1313&_nkw=gaggia+classic&_sacat=0'

def get_data(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'html.parser')
    return soup

def parse(soup):
    results = soup.find_all('div', {'class': 's-item__info clearfix'})
    print(len(results))
    return

soup = get_data(url)
parse(soup)

这段代码给了我 58 个结果。两个页面大致相同，尽管第二个页面上有一个过滤器，但两个页面上肯定有 50 多种产品。我的第一个问题是为什么这两个都有不同的结果数字。我以为他们会是一样的。

现在，假设我使用给出 58 个结果的第一个代码，我进入教程的第二部分，目前我面临另一个问题。目前的代码是：

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://www.ebay.co.uk/sch/i.html?_from=R40&_trksid=p2380057.m570.l1313&_nkw=gaggia+classic&_sacat=0'

# Takes the URL from the above and requests the data from the page

def get_data(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'html.parser')
    return soup

# This will extract the information from the data we are looking to extract

def parse(soup):
    results = soup.find_all('div', {'class': 's-item__info clearfix'})
    for item in results:
        product = {
            #'title': item.find('h3', {'class': 's-item__title s-item__title--has-tags'}).text,
            #'soldprice': float(item.find('span', {'class': 's-item__price'}).text.replace('£','').replace(',','').strip()),  # This replaces the pound sign, takes away any commas that will interfere with the float number. amd strip any spaces while changing the number to a float
            #'solddate': item.find('div', {'class': 's-item__title--tagblock '}).find('span', {'class': 'POSITIVE'}).text.replace('Sold ',''),
            'bids': item.find('span', {'class': 's-item__bids s-item__bidCount'}).text,
            #'link': item.find('a', {'class': 's-item__link'})['href'],
    }
        print(product)
    return

soup = get_data(url)
parse(soup)

问题是由于某种原因，我收到了这个错误：

Traceback (most recent call last):
  File "C:\Users\XXX\PycharmProjects\EbayCameraPriceChecker\main.py", line 30, in <module>
    parse(soup)
  File "C:\Users\XXX\PycharmProjects\EbayCameraPriceChecker\main.py", line 23, in parse
    'bids': item.find('span', {'class': 's-item__bids s-item__bidCount'}).text,
AttributeError: 'NoneType' object has no attribute 'text'

我已经查看了它的含义，但无法真正理解出了什么问题，我怀疑这与我的 PyCharm 设置或某些模块有关。

【问题讨论】：

欢迎来到 SO - 每个问题都应该准确地指向一个问题，以保持简洁和专注，对于每个额外的问题，都应该提出一个新问题。谢谢

标签： python web-scraping

【解决方案1】：

注意 基于假设，此答案将关注 AttributeError 的“真正”问题。

会发生什么？

正如AttributeError 提到的，您正在NoneType 上调用.text 方法。

这些NoneType出现，导致您的选择item.find('span', {'class': 's-item__bids s-item__bidCount'})找不到元素，在这种情况下并不罕见，导致某些报价没有对这些出价的选项-> 没有元素, 没有文字

如何解决？

检查元素是否存在 - 如果存在，请调用 .text 方法，否则将您的变量设置为 None 或您想分配给它的任何内容。

'bids':  item.find('span', {'class': 's-item__bids s-item__bidCount'}).text if item.find('span', {'class': 's-item__bids s-item__bidCount'}) else None,

或

'bids':  bids.text if (bids := item.find('span', {'class': 's-item__bids s-item__bidCount'})) else None,

示例：

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://www.ebay.co.uk/sch/i.html?_from=R40&_trksid=p2380057.m570.l1313&_nkw=gaggia+classic&_sacat=0'

# Takes the URL from the above and requests the data from the page

def get_data(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'html.parser')
    return soup

# This will extract the information from the data we are looking to extract

def parse(soup):
    results = soup.select('ul.srp-results li')
    for item in results:
        product = {
            #'title': item.find('h3', {'class': 's-item__title s-item__title--has-tags'}).text,
            #'soldprice': float(item.find('span', {'class': 's-item__price'}).text.replace('£','').replace(',','').strip()),  # This replaces the pound sign, takes away any commas that will interfere with the float number. amd strip any spaces while changing the number to a float
            #'solddate': item.find('div', {'class': 's-item__title--tagblock '}).find('span', {'class': 'POSITIVE'}).text.replace('Sold ',''),
            'bids':  bids.text if (bids := item.find('span', {'class': 's-item__bids s-item__bidCount'})) else None,
            #'link': item.find('a', {'class': 's-item__link'})['href'],
    }
        print(product)
    return

soup = get_data(url)
parse(soup)

输出

{'bids': None}
{'bids': None}
{'bids': None}
{'bids': '9 bids'}
{'bids': None}
{'bids': '0 bids'}
{'bids': None}
{'bids': '0 bids'}
{'bids': None}
...

【讨论】：

很高兴为您提供帮助，欢迎来到 Stack Overflow。如果此答案或任何其他答案解决了您的问题，请将其标记为已接受 - someone-answers - 谢谢