如果爬虫在服务器中持续运行,那么日志都会写入到一个文件中,这样不方便管理日志

custom_settings = {
        'DEFAULT_REQUEST_HEADERS': {
            'User-Agent':
                'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36',
            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
            'Host': 'v.qq.com',
            'Proxy-Connection': 'keep-alive',

        },
        'LOG_FILE': 'logs/PlayInfoDemoSpider_' + str(datetime.datetime.now()) + '.log',
        'REDIRECT_ENABLED': False,
        'DOWNLOAD_DELAY': 0,
        'DOWNLOAD_TIMEOUT': 3,
        'RETRY_TIMES': 30,
        'CONCURRENT_REQUESTS': 30,
        'CONCURRENT_REQUESTS_PER_DOMAIN': 200,
        'CONCURRENT_REQUESTS_PER_IP': 0,
        #'DOWNLOADER_MIDDLEWARES': {'bo_lib.scrapy_tools.BOProxyMiddlewareVPS': 740},
    }

在custom_settings 中配置了爬虫日志的生成,

以下是删除旧的日志的代码

def delete_old_logs(name, days):
    today_str = str(datetime.date.today())
    today = datetime.datetime.strptime(today_str, '%Y-%m-%d')  # 转化为datetime类型,时间为当天0点
    target_day = today - datetime.timedelta(days=days)
    root, dirs, files = [x for x in os.walk('logs')][0]
    for file_name in files:
        if name not in file_name:
            continue
        try:
            log_create_day_str = file_name.split('_')[1].split(' ')[0]
            log_create_day = datetime.datetime.strptime(log_create_day_str, '%Y-%m-%d')
        except:
            continue
        if log_create_day < target_day:
            file_path = root + '/' + file_name
            os.remove(file_path)

 

相关文章:

  • 2021-06-06
  • 2022-01-18
  • 2021-08-07
  • 2021-11-17
  • 2022-12-23
  • 2021-12-16
  • 2021-07-13
  • 2021-07-10
猜你喜欢
  • 2021-12-14
  • 2022-12-23
  • 2021-10-19
  • 2022-12-23
  • 2022-12-23
  • 2021-11-30
  • 2022-12-23
相关资源
相似解决方案