Robots.txt 文件问题处理答案

【问题标题】：Robots.txt file issue handlingRobots.txt 文件问题处理
【发布时间】：2023-01-09 18:19:16
【问题描述】：

使用此命令时 robots.txt 有什么影响不允许： / 用户代理：Robozilla 不允许： / 用户代理： * 不允许：不允许：/cgi-bin/ 站点地图：https://koyal.pk/sitemap/sitemap.xml

googlebot 爬虫如何访问这个的结果

【问题讨论】：

【解决方案1】：

如果你想知道 Google 将如何对 robots.txt 文件作出反应，你应该通过在Google's robots.txt testing tool 中进行测试来获得官方答案。在这里，我使用您提供的 robots.txt 进行此类测试的结果：

Googlebot 将能够抓取该网站，但 Google 告诉您您使用的 robots.txt 语法存在问题。我看到几个问题：

语法正确的 robots.txt，我认为会做你想要的是：

User-agent: Robozilla
Disallow: /

User-agent: *
Disallow: /cgi-bin/
Sitemap: https://koyal.pk/sitemap/sitemap.xml

这将防止 Robozilla 机器人爬行，同时允许所有其他机器人（包括 Googlebot）爬行除 /cgi-bin/ 目录之外的所有内容。

【讨论】：