【发布时间】:2022-01-17 09:32:33
【问题描述】:
我目前正在学习通过以下链接创建 Ebay Scraper 的教程:
https://www.youtube.com/watch?v=csj1RoLTMIA&t=290s
我正在编写代码,突然注意到我的代码有两点:
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://www.ebay.co.uk/sch/i.html?_from=R40&_nkw=gaggia+classic&_sacat=0&rt=nc LH_Complete=1'
def get_data(url):
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
return soup
def parse(soup):
results = soup.find_all('div', {'class': 's-item__info clearfix'})
print(len(results))
return
soup = get_data(url)
parse(soup)
上面的代码给了我 0 个结果,而:
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://www.ebay.co.uk/sch/i.html?_from=R40&_trksid=p2380057.m570.l1313&_nkw=gaggia+classic&_sacat=0'
def get_data(url):
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
return soup
def parse(soup):
results = soup.find_all('div', {'class': 's-item__info clearfix'})
print(len(results))
return
soup = get_data(url)
parse(soup)
这段代码给了我 58 个结果。两个页面大致相同,尽管第二个页面上有一个过滤器,但两个页面上肯定有 50 多种产品。我的第一个问题是为什么这两个都有不同的结果数字。我以为他们会是一样的。
现在,假设我使用给出 58 个结果的第一个代码,我进入教程的第二部分,目前我面临另一个问题。目前的代码是:
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://www.ebay.co.uk/sch/i.html?_from=R40&_trksid=p2380057.m570.l1313&_nkw=gaggia+classic&_sacat=0'
# Takes the URL from the above and requests the data from the page
def get_data(url):
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
return soup
# This will extract the information from the data we are looking to extract
def parse(soup):
results = soup.find_all('div', {'class': 's-item__info clearfix'})
for item in results:
product = {
#'title': item.find('h3', {'class': 's-item__title s-item__title--has-tags'}).text,
#'soldprice': float(item.find('span', {'class': 's-item__price'}).text.replace('£','').replace(',','').strip()), # This replaces the pound sign, takes away any commas that will interfere with the float number. amd strip any spaces while changing the number to a float
#'solddate': item.find('div', {'class': 's-item__title--tagblock '}).find('span', {'class': 'POSITIVE'}).text.replace('Sold ',''),
'bids': item.find('span', {'class': 's-item__bids s-item__bidCount'}).text,
#'link': item.find('a', {'class': 's-item__link'})['href'],
}
print(product)
return
soup = get_data(url)
parse(soup)
问题是由于某种原因,我收到了这个错误:
Traceback (most recent call last):
File "C:\Users\XXX\PycharmProjects\EbayCameraPriceChecker\main.py", line 30, in <module>
parse(soup)
File "C:\Users\XXX\PycharmProjects\EbayCameraPriceChecker\main.py", line 23, in parse
'bids': item.find('span', {'class': 's-item__bids s-item__bidCount'}).text,
AttributeError: 'NoneType' object has no attribute 'text'
我已经查看了它的含义,但无法真正理解出了什么问题,我怀疑这与我的 PyCharm 设置或某些模块有关。
【问题讨论】:
-
欢迎来到 SO - 每个问题都应该准确地指向一个问题,以保持简洁和专注,对于每个额外的问题,都应该提出一个新问题。谢谢
标签: python web-scraping