【问题标题】:how to return item load in scrapy loop如何在scrapy循环中返回项目负载
【发布时间】:2017-01-12 12:49:05
【问题描述】:

代码如下,每次只返回第一个循环,最后9个循环消失了。那么我应该怎么做才能得到所有的循环呢?

我尝试添加一个 "m = []" 和 m.append(l) ,但得到一个错误 "ERROR: Spider must return Request, BaseItem, dict or None, got 'ItemLoader'"

链接是http://ajax.lianjia.com/ajax/housesell/area/district?ids=23008619&limit_offset=0&limit_count=100&sort=&&city_id=110000

def parse(self, response):
    jsonresponse = json.loads(response.body_as_unicode())
    for i in range(0,len(jsonresponse['data']['list'])):
        l = ItemLoader(item = ItjuziItem(),response=response)
        house_code = jsonresponse['data']['list'][i]['house_code']
        price_total = jsonresponse['data']['list'][i]['price_total']
        ctime = jsonresponse['data']['list'][i]['ctime']
        title = jsonresponse['data']['list'][i]['title']
        frame_hall_num = jsonresponse['data']['list'][i]['frame_hall_num']
        tags = jsonresponse['data']['list'][i]['tags']
        house_area = jsonresponse['data']['list'][i]['house_area']
        community_id = jsonresponse['data']['list'][i]['community_id']
        community_name = jsonresponse['data']['list'][i]['community_name']
        is_two_five = jsonresponse['data']['list'][i]['is_two_five']
        frame_bedroom_num = jsonresponse['data']['list'][i]['frame_bedroom_num']
        l.add_value('house_code',house_code)
        l.add_value('price_total',price_total)
        l.add_value('ctime',ctime)
        l.add_value('title',title)
        l.add_value('frame_hall_num',frame_hall_num)
        l.add_value('tags',tags)
        l.add_value('house_area',house_area)
        l.add_value('community_id',community_id)
        l.add_value('community_name',community_name)
        l.add_value('is_two_five',is_two_five)
        l.add_value('frame_bedroom_num',frame_bedroom_num)
        print l
        return l.load_item()

【问题讨论】:

    标签: python json ajax scrapy web-crawler


    【解决方案1】:

    错误:

    错误:Spider 必须返回 Request、BaseItem、dict 或 None,得到 '项目加载器'

    有点误导,因为您也可以返回生成器!这里发生的是 return 打破了循环和整个函数。你可以把这个函数变成一个生成器来避免这种情况。

    只需将最后一行中的return 替换为yield

    return l.load_item()
    

    到:

    yield l.load_item()
    

    【讨论】:

    • 太棒了!那行得通!
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-02-18
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2014-07-16
    相关资源
    最近更新 更多