brianzhu

我们可以根据客户端的 user-agents 首部字段来阻止指定的爬虫爬取我们的网站:

虚拟主机配置如下:(红色标记为添加或者修改内容)

[root@Nginx www_date]# cat brian.conf 
    server {
        listen       80;
        server_name  www.brian.com;
       if ($http_user_agent ~* "qihoobot|Baiduspider|Googlebot|Googlebot-Mobile|Googlebot-Image|Mediapartners-Google|Adsbot-Google|Yahoo! Slurp China|YoudaoBot|Sosospider|Sogou spider|Sogou web spider|MSNBot") {
           return 403;
      }
        location / {
            root   html/brian;
            index  index.html index.htm;
            #limit_conn addr 1;
        limit_conn perserver 2;
        auth_basic    "brian training";
        auth_basic_user_file  /opt/nginx/conf/htpasswd;
            
        }
    location ~ .*\.(js|jpg|JPG|jpeg|JPEG|css|bmp|gif|GIF)$ {
        access_log off;
    }
        access_log logs/brian.log main gzip buffer=128k flush=5s; 
        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root   html;
        }
 }

 

分类:

技术点:

相关文章:

  • 2022-01-01
  • 2021-11-16
  • 2022-03-03
  • 2022-12-23
  • 2021-12-23
  • 2021-09-27
  • 2022-01-23
猜你喜欢
  • 2021-12-27
  • 2021-11-20
  • 2021-12-25
  • 2022-03-01
  • 2022-01-03
  • 2021-09-15
  • 2021-10-23
相关资源
相似解决方案