Python_selenium之获取页面上的全部邮箱

一、思路拆分

获取网页（这里以百度的“联系我们”为例），网址http://home.baidu.com/contact.html
获取页面的全部内容（driver.page_source）
运用正则表达式，导入re模块找到邮箱的字段
循环打印出邮箱（去重）

二、测试脚本

1. 源代码如下：

#coding:utf-8

from selenium import webdriver

import re#导入re模块

driver=webdriver.Firefox()

driver.maximize_window()

driver.implicitly_wait(8)

driver.get("http://home.baidu.com/contact.html")

doc=driver.page_source#获取网页所有的内容

emails=re.findall(r'[\w]+@[\w\.-]+',doc)#邮箱的正则表达式

for email in list(set(emails)):#去掉重复的邮箱

print email

2. 测试结果如下图1所示

Python_selenium之获取页面上的全部邮箱

相关文章：

2021-05-29
2022-02-18
2021-07-21
2021-08-09
2022-12-23
2022-12-23
2022-12-23

猜你喜欢

2022-12-23
2022-12-23
2022-12-23
2021-06-29
2021-12-27
2022-12-23
2022-12-23

相关资源

下载 2021-06-22
下载 2023-02-24
下载 2023-01-21
下载 2023-02-09

相似解决方案

热门标签

Java Python linux javascript Mysql C# Docker 算法前端 SpringBoot Redis Vue spring 设计模式 .net core .net kubernetes c++ 数据库数据结构大数据 js 机器学习微服务 Android Go 程序员面试 JVM ASP.net core 云原生人工智能后端 PHP git CSS golang k8s Nginx Django mybatis 深度学习多线程 React 架构 devops 爬虫云计算 Spring Boot LeetCode