【发布时间】:2020-04-06 15:27:51
【问题描述】:
我正在尝试抓取网站以获取其数据,并且浏览器上的 javascript 似乎正在停止获取电子邮件地址。
有人能告诉我如何获取电子邮件地址吗?
网站:https://directory.easternuc.com/publicDirectory
from scrapy import cmdline
import scrapy
from tutorial.items import TutorialItem
class DemoSpider(scrapy.Spider):
name = "DemoSpider"
def start_requests(self):
urls = []
for page in range(1, 3):
url = "https://directory.easternuc.com/publicDirectory?page=%s" %page
urls.append(url)
for url in urls:
yield scrapy.Request(url=url, callback=self.parse)
def parse(self, response):
item = TutorialItem()
index = 1
for _ in response.selector.xpath("//tr/td/h4/text()").getall():
item['name'] = response.selector.xpath("//tr[%s]/td/h4/text()" % index).get()
item['phone'] = response.selector.xpath("//tr[%s]/td[2]/text()" % index).get()
item['mobile'] = response.selector.xpath("//tr[%s]/td[3]/text()" % index).get()
item['email'] = response.selector.xpath("//tr[%s]/td[4]/text()" % index).get()
index += 1
yield item
【问题讨论】:
标签: python web-scraping scrapy