【问题标题】:how can i default values for scraped result when they have a return of null/none当它们返回 null/none 时,我如何为抓取的结果设置默认值
【发布时间】:2021-11-01 15:24:33
【问题描述】:

我从一个网站上抓取了一些信息,其中一些输出不存在,它返回 null。在这种情况下,有没有办法为不同的字段输出默认值。示例脚本如下。

脚本.py

import scrapy

class UfcscraperSpider(scrapy.Spider):
    name = 'ufcscraper'

    start_urls = ['http://ufcstats.com/statistics/fighters?char=a']

    def parse(self, response):
        for user_info in response.css(".b-statistics__table-row")[2::]:
            result = {
                "fname": user_info.css("td:nth-child(1) a::text").get(),
                "lname": user_info.css("td:nth-child(2) a::text").get(),
                "nname": user_info.css("td:nth-child(3) a::text").get(),
                "height": user_info.css("td:nth-child(4)::text").get().strip(),
                "weight": user_info.css("td:nth-child(5)::text").get().strip(),
                "reach": user_info.css("td:nth-child(6)::text").get().strip(),
                "stance": user_info.css("td:nth-child(7)::text").get().strip(),
                "win": user_info.css("td:nth-child(8)::text").get().strip(),
                "lose": user_info.css("td:nth-child(9)::text").get().strip(),
                "draw": user_info.css("td:nth-child(10)::text").get().strip()
            }

        yield result

例如,第一行中的 nname 字段的值为 null,而 stand 的值为“”,这是一个空字符串左右,我如何为此类事件设置默认值。

样本结果

[
{"fname": "Tom", "lname": "Aaron", "nname": null, "height": "--", "weight": "155 lbs.", "reach": "--", "stance": "", "win": "5", "lose": "3", "draw": "0"},
{"fname": "Danny", "lname": "Abbadi", "nname": "The Assassin", "height": "5' 11\"", "weight": "155 lbs.", "reach": "--", "stance": "Orthodox", "win": "4", "lose": "6", "draw": "0"},
]

【问题讨论】:

    标签: python web-scraping scrapy


    【解决方案1】:

    您可以输入逻辑来替换函数中的任何“”,或者您可以循环遍历结果,当您遇到"" replaqce 时,使用您想要的任何默认值。

    data = [
    {"fname": "Tom", "lname": "Aaron", "nname": "", "height": "--", "weight": "155 lbs.", "reach": "--", "stance": "", "win": "5", "lose": "3", "draw": "0"},
    {"fname": "Danny", "lname": "Abbadi", "nname": "The Assassin", "height": "5' 11\"", "weight": "155 lbs.", "reach": "--", "stance": "Orthodox", "win": "4", "lose": "6", "draw": "0"},
    ]
    
    
    for idx, each in enumerate(data):
        for k, v in each.items():
            if v == '':
                data[idx][k] = 'DEFAULT'
    

    输出:

    print(data)
    [
    {'fname': 'Tom', 'lname': 'Aaron', 'nname': 'DEFAULT', 'height': '--', 'weight': '155 lbs.', 'reach': '--', 'stance': 'DEFAULT', 'win': '5', 'lose': '3', 'draw': '0'}, 
    {'fname': 'Danny', 'lname': 'Abbadi', 'nname': 'The Assassin', 'height': '5\' 11"', 'weight': '155 lbs.', 'reach': '--', 'stance': 'Orthodox', 'win': '4', 'lose': '6', 'draw': '0'}
    ]
    

    【讨论】:

      猜你喜欢
      • 2016-11-03
      • 2019-04-13
      • 2013-05-12
      • 2015-09-01
      • 2013-12-08
      • 2019-04-12
      • 1970-01-01
      • 1970-01-01
      • 2019-06-10
      相关资源
      最近更新 更多