【问题标题】:Steam Scrapy ProblemsSteam Scrapy 问题
【发布时间】:2020-11-04 02:40:13
【问题描述】:

这是我的代码:

# -*- coding: utf-8 -*-
import scrapy

class GameSpider(scrapy.Spider):
    name = 'game'
    allowed_domains = ['store.steampowered.com']
    start_urls = ['https://store.steampowered.com/search/results/?query&start=0&count=50&dynamic_data=&sort_by=_ASC&snr=1_7_7_230_7&category1=998&infinite=1']

    def parse(self, response):

        print(response.body)
        game_href = str(response.xpath(".//@href").extract())
        
        print(game_href)

我的问题是当我运行scrapy 时,我只得到 17 个链接(总共 50 个链接)。我尝试检查response.body,它是正确的。

【问题讨论】:

    标签: python scrapy steam


    【解决方案1】:

    页面返回 json 数据,但您将其解析为 html。

    如果你只解析实际的 html 部分,你会得到所有的链接:

    >>> fetch('https://store.steampowered.com/search/results/?query&start=0&count=50&dynamic_data=&sort_by=_ASC&snr=1_7_7_230_7&category1=998&infinite=1')
    2020-11-04 07:13:12 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://store.steampowered.com/search/results/?query&start=0&count=50&dynamic_data=&sort_by=_ASC&snr=1_7_7_230_7&category1=998&infinite=1> (referer: None)
    >>> data = response.json()
    >>> sel = scrapy.Selector(text=data['results_html'])
    >>> game_href = sel.xpath('//@href').getall()
    >>> len(game_href)
    50
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2015-09-22
      • 2018-02-04
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2016-06-22
      • 1970-01-01
      相关资源
      最近更新 更多