【发布时间】:2016-02-17 19:56:24
【问题描述】:
我是scrapy的新手,在运行蜘蛛爬行时behance
import scrapy
from scrapy.selector import Selector
from behance.items import BehanceItem
from selenium import webdriver
from scrapy.http import TextResponse
from scrapy.crawler import CrawlerProcess
class DmozSpider(scrapy.Spider):
name = "behance"
#allowed_domains = ["behance.com"]
start_urls = [
"https://www.behance.net/gallery/29535305/Mind-Your-Monsters",
]
def __init__ (self):
self.driver = webdriver.Firefox()
def parse(self, response):
self.driver.get(response.url)
response = TextResponse(url=response.url, body=self.driver.page_source, encoding='utf-8')
item = BehanceItem()
hxs = Selector(response)
item['link'] = response.xpath("//div[@class='js-project-module-image-hd project-module module image project-module-image']/@data-hd-src").extract()
yield item
process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})
process.crawl(DmozSpider)
process.start()
当我运行爬虫时,我在命令行上收到以下错误
Traceback(最近一次调用最后一次): 文件“/home/davy/behance/behance/spiders/behance_spider.py”,第 3 行,在 从 behance.items 导入 BehanceItem
ImportError: 没有名为 behance.items 的模块
我的目录结构:
behance/
├── behance
│ ├── __init__.py
│ ├── items.py
│ ├── pipelines.py
│ ├── settings.py
│ └── spiders
│ ├── __init__.py
│ └── behance_spider.py
-── scrapy.cfg
【问题讨论】:
-
你的 items.py 文件的内容是什么?
-
@narko
import scrapy class BehanceItem(scrapy.Item): # define the fields for your item here like: # name = scrapy.Field() link = scrapy.Field()
标签: python web-crawler scrapy