trevain

 模拟爬虫引擎绕过一些防火墙

 

 1 #搜索引擎爬虫模拟及模拟真实用户
 2 import requests
 3 import time
 4 
 5 headers={
 6     \'Connection\': \'keep-alive\',
 7     \'Cache-Control\': \'max-age=0\',
 8     \'Upgrade-Insecure-Requests\': \'1\',
 9     #模拟用户 Kit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36
10     #模拟引擎 Mozilla/5.0 (compatible; Baiduspider-render/2.0; +http://www.baidu.com/search/spider.html)
11     #更多爬虫引擎:https://www.cnblogs.com/iack/p/3557371.html
12     \'User-Agent\': \'Mozilla/5.0 (compatible; Baiduspider-render/2.0; +http://www.baidu.com/search/spider.html)\',
13     \'Sec-Fetch-Dest\': \'document\',
14     \'Accept\': \'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9\',
15     \'Sec-Fetch-Site\': \'none\',
16     \'Sec-Fetch-Mode\': \'navigate\',
17     \'Sec-Fetch-User\': \'?1\',
18     \'Accept-Encoding\': \'gzip, deflate, br\',
19     \'Accept-Language\': \'zh-CN,zh;q=0.9,en-US;q=0.8,en;q=0.7\',
20    \'Cookie\': \'xxx\',#根据当前访问cookie
21 }
22 
23 for paths in open(\'php_b.txt\',encoding=\'utf-8\'):
24     url=\'http://192.168.0.103:8081/\'
25     paths=paths.replace(\'\n\',\'\')
26     urls=url+paths
27     #如需测试加代理,或加入代理池需加代理
28     proxy = {
29         \'http\': \'127.0.0.1:7777\'
30     }
31     try:
32         code=requests.get(urls,headers=headers,verify=False).status_code
33         print(urls+\'|\'+str(code))
34         if code==200 or code==403:
35             print(urls+\'|\'+str(code))
36     except Exception as err:
37         print(\'connecting error\')
38         #time.sleep(3) 模拟用户需延时 引擎可用可不用(根据请求速度)

目前测试:

安全狗:爬虫引擎对安全狗有效

阿里云:延时或者代理池,爬虫引擎对阿里云无效  延迟设置3秒有效,2秒都不行

宝塔:黑名单各种扫描软件,awvs,nmap,等;爬虫位置;延迟或者代理池可以绕过  延迟设置2秒左右

60秒内6次恶意请求封IP600秒 //绕过,字典60秒5次,或者加干扰.bak.和文件上传的绕过原理差不多

 

分类:

技术点:

相关文章: