如何使用 robots.txt 排除爬虫以索引我网站的某些页面？ [复制]答案

【问题标题】：How can I exclude crawlers to index certain page of my website using robots.txt? [duplicate]如何使用 robots.txt 排除爬虫以索引我网站的某些页面？ [复制]
【发布时间】：2017-08-25 09:55:02
【问题描述】：

我在我的根 robots.txt 上试过这个：

User-agent:  *
Allow: /
Disallow: /*&action=surprise

Sitemap: https://example.com/sitemap.php

我想从抓取网址中排除：

https://example.com/track&id=13&action=surprise&autoplay

从access.log 文件中，我再次看到一些机器人访问了这些网址。

是我做错了什么还是只是某些机器人没有遵循我的robots.txt 设置？

【问题讨论】：

我忘记了主要工具！ Google 在网站管理员控制台中有一个robots.txt 测试员。我的robots.txt 看起来是正确的，但是像 Ahrefs 这样的坏机器人会忽略它。

标签： robots.txt

【解决方案1】：

我不得不说，并非所有机器人都会遵守规则并遵循您的 robtos.txt。您需要添加一些反爬虫技术来禁止访问... 如：

检查用户代理
计算螺栓的ip

【讨论】：