【发布时间】:2021-11-01 15:24:33
【问题描述】:
我从一个网站上抓取了一些信息,其中一些输出不存在,它返回 null。在这种情况下,有没有办法为不同的字段输出默认值。示例脚本如下。
脚本.py
import scrapy
class UfcscraperSpider(scrapy.Spider):
name = 'ufcscraper'
start_urls = ['http://ufcstats.com/statistics/fighters?char=a']
def parse(self, response):
for user_info in response.css(".b-statistics__table-row")[2::]:
result = {
"fname": user_info.css("td:nth-child(1) a::text").get(),
"lname": user_info.css("td:nth-child(2) a::text").get(),
"nname": user_info.css("td:nth-child(3) a::text").get(),
"height": user_info.css("td:nth-child(4)::text").get().strip(),
"weight": user_info.css("td:nth-child(5)::text").get().strip(),
"reach": user_info.css("td:nth-child(6)::text").get().strip(),
"stance": user_info.css("td:nth-child(7)::text").get().strip(),
"win": user_info.css("td:nth-child(8)::text").get().strip(),
"lose": user_info.css("td:nth-child(9)::text").get().strip(),
"draw": user_info.css("td:nth-child(10)::text").get().strip()
}
yield result
例如,第一行中的 nname 字段的值为 null,而 stand 的值为“”,这是一个空字符串左右,我如何为此类事件设置默认值。
样本结果
[
{"fname": "Tom", "lname": "Aaron", "nname": null, "height": "--", "weight": "155 lbs.", "reach": "--", "stance": "", "win": "5", "lose": "3", "draw": "0"},
{"fname": "Danny", "lname": "Abbadi", "nname": "The Assassin", "height": "5' 11\"", "weight": "155 lbs.", "reach": "--", "stance": "Orthodox", "win": "4", "lose": "6", "draw": "0"},
]
【问题讨论】:
标签: python web-scraping scrapy