日志模块:
为什么要实现日志模块
能够方便的对程序进行测试
能够方便记录程序的运行状态
能够方便记录错误信息
日志的实现
代码:
# utils/log.py import sys import logging from settings import LOG_FMT,LOG_LEVEL,LOG_FILENAME,LOG_DATEFMT class Logger(object): def __init__(self): # 1. 获取一个logger对象 self._logger = logging.getLogger() # 2. 设置format对象 self.formatter = logging.Formatter(fmt=LOG_FMT,datefmt=LOG_DATEFMT) # 3. 设置日志输出 # 3.1 设置文件日志模式 self._logger.addHandler(self._get_file_handler(LOG_FILENAME)) # 3.2 设置终端日志模式 self._logger.addHandler(self._get_console_handler()) # 4. 设置日志等级 self._logger.setLevel(LOG_LEVEL) def _get_file_handler(self, filename): \'\'\'返回一个文件日志handler\'\'\' # 1. 获取一个文件日志handler filehandler = logging.FileHandler(filename=filename,encoding="utf-8") # 2. 设置日志格式 filehandler.setFormatter(self.formatter) # 3. 返回 return filehandler def _get_console_handler(self): \'\'\'返回一个输出到终端日志handler\'\'\' # 1. 获取一个输出到终端日志handler console_handler = logging.StreamHandler(sys.stdout) # 2. 设置日志格式 console_handler.setFormatter(self.formatter) # 3. 返回handler return console_handler @property def logger(self): return self._logger # 初始化并配一个logger对象,达到单例的 # 使用时,直接导入logger就可以使用 logger = Logger().logger if __name__ == \'__main__\': logger.debug("调试信息") logger.info("状态信息") logger.warning("警告信息") logger.error("错误信息") logger.critical("严重错误信息")
配置文件settings.py
# 日志的配置信息
import logging
# 默认的配置 LOG_LEVEL=logging.INFO # 日志的默认等级 LOG_FMT=\'%(asctime)s %(filename)s [line:%(lineno)d] %(levelname)s: %(message)s\' LOG_DATEFMT=\'%Y-%m-%d %H:%M:%S\' # 默认时间格式 LOG_FILENAME=\'log.log\' # 默认日志文件名称
测试结果:
http模块:
目的:获取随机的User-Agent请求头
步骤:
1.准备User-Agent的列表
2.写一个方法获取随机User-Agent的请求头
代码:
import random """ 5.2 http模块 我在从代理IP网站上抓取代理IP 和 检验代理IP时候, 为了不容易不服务器识别为是一个爬虫, 我们最好提供随机的User-Agent请求头. 目标: 获取随机User-Agent的请求头 步骤: 1. 准备User-Agent的列表 2. 实现一个方法, 获取随机User-Agent的请求头 """ # 1. 准备User-Agent的列表 USER_AGENTS = [ "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; AcooBrowser; .NET CLR 1.1.4322; .NET CLR 2.0.50727)", "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Acoo Browser; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506)", "Mozilla/4.0 (compatible; MSIE 7.0; AOL 9.5; AOLBuild 4337.35; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)", "Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)", "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 2.0.50727; Media Center PC 6.0)", "Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 1.0.3705; .NET CLR 1.1.4322)", "Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 5.2; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; .NET CLR 3.0.04506.30)", "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN) AppleWebKit/523.15 (KHTML, like Gecko, Safari/419.3) Arora/0.3 (Change: 287 c9dfb30)", "Mozilla/5.0 (X11; U; Linux; en-US) AppleWebKit/527+ (KHTML, like Gecko, Safari/419.3) Arora/0.6", "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2pre) Gecko/20070215 K-Ninja/2.1.1", "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9) Gecko/20080705 Firefox/3.0 Kapiko/3.0", "Mozilla/5.0 (X11; Linux i686; U;) Gecko/20070322 Kazehakase/0.4.5", "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.8) Gecko Fedora/1.9.0.8-1.fc10 Kazehakase/0.5.6", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.20 (KHTML, like Gecko) Chrome/19.0.1036.7 Safari/535.20", "Opera/9.80 (Macintosh; Intel Mac OS X 10.6.8; U; fr) Presto/2.9.168 Version/11.52", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.11 TaoBrowser/2.0 Safari/536.11", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.71 Safari/537.1 LBBROWSER", "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; LBBROWSER)", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E; LBBROWSER)", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.84 Safari/535.11 LBBROWSER", "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)", "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; QQBrowser/7.0.3698.400)", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)", "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; SV1; QQDownload 732; .NET4.0C; .NET4.0E; 360SE)", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)", "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)", "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1", "Mozilla/5.0 (iPad; U; CPU OS 4_2_1 like Mac OS X; zh-cn) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8C148 Safari/6533.18.5", "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:2.0b13pre) Gecko/20110307 Firefox/4.0b13pre", "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:16.0) Gecko/20100101 Firefox/16.0", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11", "Mozilla/5.0 (X11; U; Linux x86_64; zh-CN; rv:1.9.2.10) Gecko/20100922 Ubuntu/10.10 (maverick) Firefox/3.6.10" ] # 实现一个方法, 获取随机User-Agent的请求头 def get_request_headers(): headers = { \'User-Agent\': random.choice(USER_AGENTS), #利用random模块的choice方法随机选取 \'Accept\': \'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\',# 接受类型 \'Accept-Language\': \'en-US,en;q=0.5\',# 接收的语言 \'Connection\': \'keep-alive\', #连接 \'Accept-Encoding\': \'gzip, deflate\',# 压缩方式 } return headers if __name__ == \'__main__\': print(get_request_headers()) print(get_request_headers()) print(get_request_headers())
测试结果:
发现可以随机获取User-Agent