【发布时间】:2018-03-08 11:03:05
【问题描述】:
# -*- coding: utf-8 -*-
import scrapy
from scrapy.http import Request
class InfoSpider(scrapy.Spider):
name = 'info'
allowed_domains = ['womenonlyconnected.com']
start_urls =['http://www.womenonlyconnected.com/socialengine/pageitems/index']
def parse(self, response):
urls = response.xpath('//h3/a/@href').extract()
for url in urls:
absolute_url = response.urljoin(url)
yield Request(absolute_url , callback = self.parse_page)
def parse_page(self , response):
pass
这是我使用此代码的代码我只能抓取前 24 个链接只需要帮助在页面上“查看更多”之后抓取所有链接 pag url在下面给出 http://www.womenonlyconnected.com/socialengine/pageitems/index
【问题讨论】:
标签: python web web-scraping scrapy scrapy-spider