【发布时间】:2016-04-01 22:17:34
【问题描述】:
我正在尝试编写一个从索尼的 PlayStation 商店抓取信息的网络应用程序。我找到了包含我想要的数据的 JSON 文件,但我想知道如何使用 Scrapy 仅存储 JSON 文件的某些元素?
以下是部分 JSON 数据:
{
"age_limit":0,
"attributes":{
"facets":{
"platform":[
{"name":"PS4™","count":96,"key":"ps4"},
{"name":"PS3™","count":5,"key":"ps3"},
{"name":"PS Vita","count":7,"key":"vita"},
]
}
}
}
我只想要“名称”PS4 的“计数”值。我如何在 Scrapy 中得到这个?到目前为止,这是我的 Scrapy 代码:
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from crossbuy.items import PS4Vita
class PS4VitaSpider(BaseSpider):
name = "ps4vita" # Name of the spider, to be used when crawling
allowed_domains = ["store.playstation.com"] # Where the spider is allowed to go
start_url = "https://store.playstation.com/chihiro-api/viewfinder/US/en/999/STORE-MSF77008-9_PS4PSVCBBUNDLE?size=30&gkb=1&geoCountry=US"
def parse(self, response):
jsonresponse = json.loads(response)
pass # To be changed later
谢谢!
【问题讨论】:
-
你不能以正常方式访问 {"name": "PS4} 吗?例如
[ p["count"] for p in jsonresponse["attributes"]["facets"]["platform"] if p["name"] == "PS4™" ]?