【问题标题】:Try to scrape email address [duplicate]尝试抓取电子邮件地址[重复]
【发布时间】:2019-07-30 08:53:30
【问题描述】:

我试图抓取这个网站

[www.united-church.ca/search/locator/all?keyw=&mission_units_ucc_ministry_type_advanced=10&locll=][1]

我确实抓取了它,但我无法抓取电子邮件地址 你能帮我刮一下吗? 我用的是scrapy

# -*- coding: utf-8 -*-
import scrapy
from ..items import ChurchItem


class ChurchSpiderSpider(scrapy.Spider):
    name = 'church_spider'
    page_number = 1
    start_urls = ['https://www.united-church.ca/search/locator/all?keyw=&mission_units_ucc_ministry_type_advanced=10&locll=']

    def parse(self, response):
        items = ChurchItem()
        container = response.css(".icon-ministry")
        for t in container:
            church_name = t.css(".field-name-locator-ministry-title a::text").extract()
            church_phone = t.css(".field-name-field-phone::text").extract()
            church_address = t.css(".thoroughfare::text").extract()
            church_email = t.css(".field-name-field-mu-email span::text").extract()

            items["church_name"] = church_name
            items["church_phone"] = church_phone
            items["church_address"] = church_address
            items["church_email"] = church_email

            yield items

        # next_page = 'https://www.united-church.ca/search/locator/all?keyw=&mission_units_ucc_ministry_type_advanced=10&locll=&page=' + str(ChurchSpiderSpider.page_number)
        # if ChurchSpiderSpider.page_number <= 110:
        #     ChurchSpiderSpider.page_number += 1
        #     yield response.follow(next_page, callback=self.parse)

我找到了一点解决办法,但还没有完成 现在的输出是这样的

{'church_address': ['7763 Highway 21'],
 'church_email': ['herbklaehn', ' [at] ', 'gmail.com'],
 'church_name': ['Allenford United Church'],
 'church_phone': ['519-35-6232']}

你能帮我把[at]换成@,然后把它组合成一个字符串吗?

【问题讨论】:

  • 不要发布重复的问题。编辑您的原始问题以包含所有详细信息。

标签: python-3.x web-scraping scrapy


【解决方案1】:

加入列表元素并替换,

email = ''.join(church_email).replace(" [at] ","@")

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2023-03-11
    • 1970-01-01
    • 1970-01-01
    • 2010-11-04
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多