Steam Scrapy 问题

【问题标题】：Steam Scrapy ProblemsSteam Scrapy 问题
【发布时间】：2020-11-04 02:40:13
【问题描述】：

这是我的代码：

# -*- coding: utf-8 -*-
import scrapy

class GameSpider(scrapy.Spider):
    name = 'game'
    allowed_domains = ['store.steampowered.com']
    start_urls = ['https://store.steampowered.com/search/results/?query&start=0&count=50&dynamic_data=&sort_by=_ASC&snr=1_7_7_230_7&category1=998&infinite=1']

    def parse(self, response):

        print(response.body)
        game_href = str(response.xpath(".//@href").extract())
        
        print(game_href)

我的问题是当我运行scrapy 时，我只得到 17 个链接（总共 50 个链接）。我尝试检查response.body，它是正确的。

【问题讨论】：

标签： python scrapy steam

【解决方案1】：

页面返回 json 数据，但您将其解析为 html。

如果你只解析实际的 html 部分，你会得到所有的链接：

>>> fetch('https://store.steampowered.com/search/results/?query&start=0&count=50&dynamic_data=&sort_by=_ASC&snr=1_7_7_230_7&category1=998&infinite=1')
2020-11-04 07:13:12 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://store.steampowered.com/search/results/?query&start=0&count=50&dynamic_data=&sort_by=_ASC&snr=1_7_7_230_7&category1=998&infinite=1> (referer: None)
>>> data = response.json()
>>> sel = scrapy.Selector(text=data['results_html'])
>>> game_href = sel.xpath('//@href').getall()
>>> len(game_href)
50

【讨论】：