到 MySQL 的 Scrapy 管道 - 找不到答案答案

【问题标题】：Scrapy pipeline to MySQL - Can't find answer到 MySQL 的 Scrapy 管道 - 找不到答案
【发布时间】：2013-01-25 17:21:19
【问题描述】：

我四处寻找答案，但找不到答案。正如我昨天提到的，我是scrapy和python的新手，所以答案可能就在那里，但我没有赶上。

我写了我的蜘蛛，它工作得很好。这是我的管道....

import sys
import MySQLdb
import hashlib
from scrapy.exceptions import DropItem
from scrapy.http import Request

class somepipeline(object):
    def __init__(self):
        self.conn = MySQLdb.connect(user='user', 'passwd', 'dbname', 'host', charset="utf8", use_unicode=True)
        self.cursor = self.conn.cursor()

    def process_item(self, item, spider):    
        try:
            self.cursor.execute("""INSERT INTO sometable (title, link, desc)  
                            VALUES (%s, %s)""", 
                           (item['title'].encode('utf-8'), 
                            item['link'].encode('utf-8'),
                            item['desc'].encode('utf-8'))

            self.conn.commit()
        except MySQLdb.Error, e:
            print "Error %d: %s" % (e.args[0], e.args[1])
        return item

这是我的设置：

BOT_NAME = 'somebot'

SPIDER_MODULES = ['somespider.spiders']
NEWSPIDER_MODULE = 'somespider.spiders'
ITEM_PIPELINES = ['myproject.pipeline.somepipeline']

但是，当我运行它时，我得到：没有名为管道的模块错误

找到了一个类似的答案，但它是针对图像类的，我只想要 HTML 数据。

我做错了什么？我需要下载另一个模块吗？感谢帮助。如果我很近，请给我一个肘部。

【问题讨论】：

这个文件在哪里？它是否包含在myproject/ 和pipeline 的/path/to/somewhere/myproject/pipeline'? And are these pacakages, i.e. are there __init__.py` 文件中（如果管道是一个目录）？
路径是 projectdirectory/project(with scrappy.cfg)/ 里面有 pipeline.py 和所有预期的文件以及 .pyc 文件。根据另一篇文章，我删除了 .pyc 文件并再次运行它。同样的问题。
如果您正在运行的脚本在 projectdirectory/project 中，那么 ITEM_PIPELINES 的正确名称应该是 pipeline.somepipeline 并且 pipeline 目录应该有 __init__.py 文件。看来你应该输入 python 包的路径，阅读它。
我的 init.py 文件是空的。那里应该有东西吗？那是我需要调查的文件吗？还是python包的路径？

标签： python scrapy

【解决方案1】：

Scrapy 教程有一个错字：它必须是 'pipelineS'

ITEM_PIPELINES = ['myproject.pipelines.somepipeline']

【讨论】：

【解决方案2】：

没有“管道”文件。它应该是“管道”。所以你需要改变

ITEM_PIPELINES = ['myproject.pipeline.somepipeline']

到

ITEM_PIPELINES = ['myproject.pipelines.somepipeline']

【讨论】：

好的。这是我在尝试做的一些 dedug 中的错误。把它放回管道，现在我得到一个：codeself.conn.commit() 错误。我目前正在查找此语法错误。
现在我收到self.conn=MySQLdb.connect(user= 'test', 'test', 'test', 'localhost') SyntaxError: non-keyword arg after keyword arg 错误这是我的代码class mypipeline(object): def __init__(self): self.conn=MySQLdb.connect(user= 'test', 'test', 'test', 'localhost') self.cursor = self.conn.cursor() 完全丢失了。任何帮助表示赞赏！
MySQLdb.connect 的语法可能不正确。请尝试使用这个： import MySQLdb import MySQLdb.cursors self.conn = MySQLdb.connect( host=HOST, user=USERNAME, passwd=PASSWORD, db=DB, cursorclass=MySQLdb.cursors.DictCursor, charset='utf8' , use_unicode=True )

【解决方案3】：

正确的目录路径应该是这样的：

myproject/
     scrapy.cfg  
     myproject/
         __init__.py
         items.py
         pipeline.py
         settings.py
         spiders/
            spider.py

换一种说法，你能确认你的爬虫工作正常吗？例如，如果您要注释掉 ITEM_PIPELINES 设置，您的蜘蛛程序是否工作并产生预期的输出？

【讨论】：

嘿@talvalin。再次感谢您的帮助。文件结构正确。我可以很好地运行蜘蛛。如果我取出管道，我可以看到蜘蛛运行得很好，我会看到所有数据都是按照我需要的方式构建的。但是，如果我执行code scrapy crawl myproject --> abc.txt，我最终会得到一个空白文件，而如果我使用 scrapy 教程中的命令运行它，我最终会得到一个缓存网页。试图找出中间立场！
好的，快速更新。我发现的一个错误是我没有运行pip install mysql-python 或easy_install mysql-python，因为我没有安装libmysqlclient-dev SOOOO 我做了所有这些，包括适当的-U 命令现在我回到了第一带有 SyntaxError：无效的语法。更多内容。
如果您在管道代码中遇到无效的语法错误，则表示导入工作正常。是这样吗？