【发布时间】:2022-01-22 20:26:49
【问题描述】:
我正在尝试根据 url 中的 brand 数字传递带有 scrapy 的请求,然后从提供下一页信息的网页中提取 id's,然后遍历下一页以获取产品 ID。
我可以传递请求并解析产品数据并将其发送到请求中,但是我不确定定义函数以让我抓取下一页的光标。
这是我的代码:
class DepopItem(scrapy.Item):
brands = Field(output_processor=TakeFirst())
ID = Field(output_processor=TakeFirst())
brand = Field(output_processor=TakeFirst())
class DepopSpider(scrapy.Spider):
name = 'depop'
start_urls = ['https://webapi.depop.com/api/v2/search/filters/aggregates/?brands=1596&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance']
brands = [1596]
custom_settings = {
'USER_AGENT': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'
}
def start_requests(self, cursor=''):
for brand in self.brands:
for item in self.create_product_request(brand):
yield item
yield scrapy.FormRequest(
url='https://webapi.depop.com/api/v2/search/products/',
method='GET',
formdata={
'brands': str(brand),
'cursor': cursor,
'itemsPerPage': '24',
'country': 'gb',
'currency': 'GBP',
'sort': 'relevance'
},
cb_kwargs={'brand': brand}
)
def parse(self, response, brand):
# load stuff
for item in response.json().get('products'):
loader = ItemLoader(DepopItem())
loader.add_value('brand', brand)
loader.add_value('ID', item.get('id'))
yield loader.load_item()
cursor = response.json()['meta'].get('cursor')
if cursor:
for item in self.create_product_request(brand, cursor):
yield item
def create_product_request(self, response):
test = response.json()['meta'].get('cursor')
yield test
我收到以下错误:
AttributeError: 'int' 对象没有属性 'json'
预期输出:
{"brand": 1596, "ID": 273027529}
{"brand": 1596, "ID": 274115361}
{"brand": 1596, "ID": 270641301}
{"brand": 1596, "ID": 274505678}
{"brand": 1596, "ID": 262857014}
{"brand": 1596, "ID": 270088589}
{"brand": 1596, "ID": 208498028}
{"brand": 1596, "ID": 270426792}
{"brand": 1596, "ID": 274483351}
{"brand": 1596, "ID": 274109923}
{"brand": 1596, "ID": 273424157}
..
..
..
【问题讨论】:
标签: python web-scraping scrapy