模拟爬虫引擎绕过一些防火墙
1 #搜索引擎爬虫模拟及模拟真实用户 2 import requests 3 import time 4 5 headers={ 6 \'Connection\': \'keep-alive\', 7 \'Cache-Control\': \'max-age=0\', 8 \'Upgrade-Insecure-Requests\': \'1\', 9 #模拟用户 Kit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36 10 #模拟引擎 Mozilla/5.0 (compatible; Baiduspider-render/2.0; +http://www.baidu.com/search/spider.html) 11 #更多爬虫引擎:https://www.cnblogs.com/iack/p/3557371.html 12 \'User-Agent\': \'Mozilla/5.0 (compatible; Baiduspider-render/2.0; +http://www.baidu.com/search/spider.html)\', 13 \'Sec-Fetch-Dest\': \'document\', 14 \'Accept\': \'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9\', 15 \'Sec-Fetch-Site\': \'none\', 16 \'Sec-Fetch-Mode\': \'navigate\', 17 \'Sec-Fetch-User\': \'?1\', 18 \'Accept-Encoding\': \'gzip, deflate, br\', 19 \'Accept-Language\': \'zh-CN,zh;q=0.9,en-US;q=0.8,en;q=0.7\', 20 \'Cookie\': \'xxx\',#根据当前访问cookie 21 } 22 23 for paths in open(\'php_b.txt\',encoding=\'utf-8\'): 24 url=\'http://192.168.0.103:8081/\' 25 paths=paths.replace(\'\n\',\'\') 26 urls=url+paths 27 #如需测试加代理,或加入代理池需加代理 28 proxy = { 29 \'http\': \'127.0.0.1:7777\' 30 } 31 try: 32 code=requests.get(urls,headers=headers,verify=False).status_code 33 print(urls+\'|\'+str(code)) 34 if code==200 or code==403: 35 print(urls+\'|\'+str(code)) 36 except Exception as err: 37 print(\'connecting error\') 38 #time.sleep(3) 模拟用户需延时 引擎可用可不用(根据请求速度)
目前测试:
安全狗:爬虫引擎对安全狗有效
阿里云:延时或者代理池,爬虫引擎对阿里云无效 延迟设置3秒有效,2秒都不行
宝塔:黑名单各种扫描软件,awvs,nmap,等;爬虫位置;延迟或者代理池可以绕过 延迟设置2秒左右
60秒内6次恶意请求封IP600秒 //绕过,字典60秒5次,或者加干扰.bak.和文件上传的绕过原理差不多