使用 Scrapy 抓取嵌套的 JSON 数据？答案

【问题标题】：Using Scrapy to scrape nested JSON data?使用 Scrapy 抓取嵌套的 JSON 数据？
【发布时间】：2016-04-01 22:17:34
【问题描述】：

我正在尝试编写一个从索尼的 PlayStation 商店抓取信息的网络应用程序。我找到了包含我想要的数据的 JSON 文件，但我想知道如何使用 Scrapy 仅存储 JSON 文件的某些元素？

以下是部分 JSON 数据：

{
  "age_limit":0,
  "attributes":{
       "facets":{
          "platform":[
              {"name":"PS4™","count":96,"key":"ps4"},
              {"name":"PS3™","count":5,"key":"ps3"},
              {"name":"PS Vita","count":7,"key":"vita"},
          ]
       }
     }
    }

我只想要“名称”PS4 的“计数”值。我如何在 Scrapy 中得到这个？到目前为止，这是我的 Scrapy 代码：

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from crossbuy.items import PS4Vita


class PS4VitaSpider(BaseSpider):
    name = "ps4vita" # Name of the spider, to be used when crawling
    allowed_domains = ["store.playstation.com"] # Where the spider is allowed to     go
    start_url = "https://store.playstation.com/chihiro-api/viewfinder/US/en/999/STORE-MSF77008-9_PS4PSVCBBUNDLE?size=30&gkb=1&geoCountry=US"

    def parse(self, response):
        jsonresponse = json.loads(response)

        pass # To be changed later

谢谢！

【问题讨论】：

你不能以正常方式访问 {"name": "PS4} 吗？例如[ p["count"] for p in jsonresponse["attributes"]["facets"]["platform"] if p["name"] == "PS4™" ]？

标签： python json scrapy

【解决方案1】：

...
def parse(self, response):
    jsonresponse = json.loads(response.body)
    my_count = None
    for platform in jsonresponse['attributes']['facets']['platform']:
        if 'PS4' in platform['name']:
            my_count = platform['count']

    yield dict(count=my_count)
...

【讨论】：

【解决方案2】：

像访问 python 字典一样访问 json 数据：

# To get a list of the counts:
counts = [x['count'] for x in jsonresponse['attributes']['facets']['platform']]

【讨论】：