【问题标题】:Unable to Pull Data from Ebay Scraper Tutorial - Getting AttributeError无法从 Ebay Scraper 教程中提取数据 - 获取 AttributeError
【发布时间】:2022-01-17 09:32:33
【问题描述】:

我目前正在学习通过以下链接创建 Ebay Scraper 的教程:

https://www.youtube.com/watch?v=csj1RoLTMIA&t=290s

我正在编写代码,突然注意到我的代码有两点:

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://www.ebay.co.uk/sch/i.html?_from=R40&_nkw=gaggia+classic&_sacat=0&rt=nc LH_Complete=1'

def get_data(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'html.parser')
    return soup

def parse(soup):
    results = soup.find_all('div', {'class': 's-item__info clearfix'})
    print(len(results))
    return

soup = get_data(url)
parse(soup)

上面的代码给了我 0 个结果,而:

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://www.ebay.co.uk/sch/i.html?_from=R40&_trksid=p2380057.m570.l1313&_nkw=gaggia+classic&_sacat=0'

def get_data(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'html.parser')
    return soup

def parse(soup):
    results = soup.find_all('div', {'class': 's-item__info clearfix'})
    print(len(results))
    return

soup = get_data(url)
parse(soup)

这段代码给了我 58 个结果。两个页面大致相同,尽管第二个页面上有一个过滤器,但两个页面上肯定有 50 多种产品。我的第一个问题是为什么这两个都有不同的结果数字。我以为他们会是一样的。

现在,假设我使用给出 58 个结果的第一个代码,我进入教程的第二部分,目前我面临另一个问题。目前的代码是:

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://www.ebay.co.uk/sch/i.html?_from=R40&_trksid=p2380057.m570.l1313&_nkw=gaggia+classic&_sacat=0'

# Takes the URL from the above and requests the data from the page

def get_data(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'html.parser')
    return soup

# This will extract the information from the data we are looking to extract

def parse(soup):
    results = soup.find_all('div', {'class': 's-item__info clearfix'})
    for item in results:
        product = {
            #'title': item.find('h3', {'class': 's-item__title s-item__title--has-tags'}).text,
            #'soldprice': float(item.find('span', {'class': 's-item__price'}).text.replace('£','').replace(',','').strip()),  # This replaces the pound sign, takes away any commas that will interfere with the float number. amd strip any spaces while changing the number to a float
            #'solddate': item.find('div', {'class': 's-item__title--tagblock '}).find('span', {'class': 'POSITIVE'}).text.replace('Sold ',''),
            'bids': item.find('span', {'class': 's-item__bids s-item__bidCount'}).text,
            #'link': item.find('a', {'class': 's-item__link'})['href'],
    }
        print(product)
    return

soup = get_data(url)
parse(soup)

问题是由于某种原因,我收到了这个错误:

Traceback (most recent call last):
  File "C:\Users\XXX\PycharmProjects\EbayCameraPriceChecker\main.py", line 30, in <module>
    parse(soup)
  File "C:\Users\XXX\PycharmProjects\EbayCameraPriceChecker\main.py", line 23, in parse
    'bids': item.find('span', {'class': 's-item__bids s-item__bidCount'}).text,
AttributeError: 'NoneType' object has no attribute 'text'

我已经查看了它的含义,但无法真正理解出了什么问题,我怀疑这与我的 PyCharm 设置或某些模块有关。

【问题讨论】:

  • 欢迎来到 SO - 每个问题都应该准确地指向一个问题,以保持简洁和专注,对于每个额外的问题,都应该提出一个新问题。谢谢

标签: python web-scraping


【解决方案1】:

注意 基于假设,此答案将关注 AttributeError 的“真正”问题。

会发生什么?

正如AttributeError 提到的,您正在NoneType 上调用.text 方法。

这些NoneType出现,导致您的选择item.find('span', {'class': 's-item__bids s-item__bidCount'})找不到元素,在这种情况下并不罕见,导致某些报价没有对这些出价的选项-> 没有元素, 没有文字

如何解决?

检查元素是否存在 - 如果存在,请调用 .text 方法,否则将您的变量设置为 None 或您想分配给它的任何内容。

'bids':  item.find('span', {'class': 's-item__bids s-item__bidCount'}).text if item.find('span', {'class': 's-item__bids s-item__bidCount'}) else None,

'bids':  bids.text if (bids := item.find('span', {'class': 's-item__bids s-item__bidCount'})) else None,

示例:

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://www.ebay.co.uk/sch/i.html?_from=R40&_trksid=p2380057.m570.l1313&_nkw=gaggia+classic&_sacat=0'

# Takes the URL from the above and requests the data from the page

def get_data(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'html.parser')
    return soup

# This will extract the information from the data we are looking to extract

def parse(soup):
    results = soup.select('ul.srp-results li')
    for item in results:
        product = {
            #'title': item.find('h3', {'class': 's-item__title s-item__title--has-tags'}).text,
            #'soldprice': float(item.find('span', {'class': 's-item__price'}).text.replace('£','').replace(',','').strip()),  # This replaces the pound sign, takes away any commas that will interfere with the float number. amd strip any spaces while changing the number to a float
            #'solddate': item.find('div', {'class': 's-item__title--tagblock '}).find('span', {'class': 'POSITIVE'}).text.replace('Sold ',''),
            'bids':  bids.text if (bids := item.find('span', {'class': 's-item__bids s-item__bidCount'})) else None,
            #'link': item.find('a', {'class': 's-item__link'})['href'],
    }
        print(product)
    return

soup = get_data(url)
parse(soup)

输出

{'bids': None}
{'bids': None}
{'bids': None}
{'bids': '9 bids'}
{'bids': None}
{'bids': '0 bids'}
{'bids': None}
{'bids': '0 bids'}
{'bids': None}
...

【讨论】:

  • 很高兴为您提供帮助,欢迎来到 Stack Overflow。如果此答案或任何其他答案解决了您的问题,请将其标记为已接受 - someone-answers - 谢谢
猜你喜欢
  • 2011-06-17
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2014-03-05
  • 2019-10-24
相关资源
最近更新 更多