【问题标题】:MySQL database error using scrapy使用scrapy的MySQL数据库错误
【发布时间】:2017-01-29 06:10:57
【问题描述】:

我正在尝试将报废的数据保存在 MySQL 数据库中。我的 script.py 是

 # -*- coding: utf-8 -*-
import scrapy
import unidecode
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from lxml import html


class ElementSpider(scrapy.Spider):
    name = 'books'
    download_delay = 3
    allowed_domains = ["goodreads.com"]
    start_urls = ["https://www.goodreads.com/list/show/19793.I_Marked_My_Calendar_For_This_Book_s_Release",]

    rules = (Rule(LinkExtractor(allow=(), restrict_xpaths=('//a[@class="next_page"]',)), callback="parse", follow= True),)

    def parse(self, response):
        for href in response.xpath('//div[@id="all_votes"]/table[@class="tableList js-dataTooltip"]/tr/td[2]/div[@class="js-tooltipTrigger tooltipTrigger"]/a/@href'):       
            full_url = response.urljoin(href.extract())
            print full_url
            yield scrapy.Request(full_url, callback = self.parse_books)
            break;


        next_page = response.xpath('.//a[@class="next_page"]/@href').extract()
        if next_page:
            next_href = next_page[0]
            next_page_url = 'https://www.goodreads.com' + next_href
            print next_page_url
            request = scrapy.Request(next_page_url, self.parse)
            yield request

    def parse_books(self, response):
        yield{
            'url': response.url,
            'title':response.xpath('//div[@id="metacol"]/h1[@class="bookTitle"]/text()').extract(),
            'link':response.xpath('//div[@id="metacol"]/h1[@class="bookTitle"]/a/@href').extract(),
        } 

而 pipeline.py 是

   # -*- coding: utf-8 -*-

# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html


import MySQLdb
import hashlib
from scrapy.exceptions import DropItem

from scrapy.http import Request
import sys

class SQLStore(object):
    def __init__(self):
        self.conn = MySQLdb.connect("localhost","root","","books" )
        self.cursor = self.conn.cursor()
        print "connected to DB"

    def process_item(self, item, spider):
        print "hi"

        try:
            self.cursor.execute("""INSERT INTO books_data(next_page_url) VALUES (%s)""", (item['url']))
            self.conn.commit()

        except Exception, e:
            print e

当我运行脚本时没有错误。 Spider 运行良好,但我认为光标未指向 process_item。即使它不打印你好。

【问题讨论】:

    标签: python mysql scrapy


    【解决方案1】:

    你的方法签名错误,它应该带item和spider参数:

    process_item(self, item, spider)
    

    您还需要在 settings.py 文件中设置管道:

     ITEM_PIPELINES = {"project_name.path.SQLStore"}
    

    你的语法也不正确,你需要传递一个元组:

      self.cursor.execute("""INSERT INTO books_data(next_page_url) VALUES (%s)""", 
        (item['url'],) # <- add ,
    

    【讨论】:

    • 已经尝试过了,但是没有用。我在 setting.py 中添加了这样的管道ITEM_PIPELINES = { 'test1.pipelines.SQLStore': 300, }
    • piplines 目录中的 init.py 文件中有什么内容?你也有process_item(self, item, spider)吗?
    • 那么scrapy是如何找到你的SQLStore管道的呢?
    • 我的意思是您的文件实际上称为管道或管道吗?您的问题中有 pipeline,上面有 pipelines。另外,您在哪里生产任何物品?
    • 当我在def __init__(self): 中打印一些东西时,看看我的pipeline.py,当我在def process_item(self): 中打印时,它什么也没打印。表示 def process_item(self): 不可调用。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2016-08-26
    • 2013-11-03
    • 1970-01-01
    • 2012-05-13
    相关资源
    最近更新 更多