【发布时间】:2018-08-01 00:30:27
【问题描述】:
我正在尝试使用 Scrapy python 库抓取 https://www.walmart.com/search/?query=ps3&cat_id=0 上的所有产品名称。
这是我的解析函数
def parseWalmart(self,response):
print("INSIDE PARSE WALMART")
for product in response.xpath('//div[@id="searchProductResult"]/div[@class="search-result-listview-items"]//div[starts-with(@data-tl-id,"ProductTileListView-")]'):
print(product)
product_name = product.xpath('.//div[contains(@class,"search-result-product-title listview")]//a//span//text()').extract()
product_page = product.xpath('.//div[contains(@class,"search-result-product-title listview")]//a/@href').extract()
product_name=" ".join(product_name)
print(product_name)
print("-------------------------------------")
这是我的scrapy请求
yield scrapy.Request(url=i, callback=self.parseWalmart, headers = {"User-Agent":"Mozilla/5.0"})
但是,我只能抓取 4 个产品,而实际上有十几个产品。我不明白为什么。这是我抓取的 4 款产品
<Selector xpath='//div[@id="searchProductResult"]/div[@class="search-result-listview-items"]//div[starts-with(@data-tl-id,"ProductTileListView-")]' data='<div data-tl-id="ProductTileListView-0">'>
ABLEGRID Wireless Bluetooth Game Controller for Sony PS3 Black
-------------------------------------
<Selector xpath='//div[@id="searchProductResult"]/div[@class="search-result-listview-items"]//div[starts-with(@data-tl-id,"ProductTileListView-")]' data='<div data-tl-id="ProductTileListView-1">'>
Arsenal Gaming PS3 Wired Controller, Black
-------------------------------------
<Selector xpath='//div[@id="searchProductResult"]/div[@class="search-result-listview-items"]//div[starts-with(@data-tl-id,"ProductTileListView-")]' data='<div data-tl-id="ProductTileListView-2">'>
Refurbished Sony PlayStation 3 Slim 320 GB Charcoal Black Console
-------------------------------------
<Selector xpath='//div[@id="searchProductResult"]/div[@class="search-result-listview-items"]//div[starts-with(@data-tl-id,"ProductTileListView-")]' data='<div data-tl-id="ProductTileListView-3">'>
Sonic's Ultimate Genesis Collection ( PS3 )
-------------------------------------
【问题讨论】:
标签: html xpath web-scraping scrapy