关于 Robots.txt 关于星号和正斜杠的问题答案

【问题标题】：Questions about Robots.txt regarding asterisk and forward slash关于 Robots.txt 关于星号和正斜杠的问题
【发布时间】：2014-05-08 14:46:05
【问题描述】：

我有几个关于 robots.txt 的问题

如果我在 robots.txt 中有以下行

不允许：/catalog/category/view/id/6

这也会阻止网址http://example.com/catalog/category/view/id/61 吗？
如果我有

禁止：/*教育

这会阻止网址http://example.com/some/uri/education 以及http://example.com/some/uri/education/another/uri
每条规则末尾是否有/ 有何不同？
如果我想禁止所有以http://example.com/disallowme 开头的网址，Disallow: /disallowme* 中是否需要*

【问题讨论】：

标签： robots.txt web-crawler

【解决方案1】：

（第一季度）

Disallow: /catalog/category/view/id/6

将阻止路径以/catalog/category/view/id/6 开头的任何 URL。所以是的，它也会屏蔽http://example.com/catalog/category/view/id/61。

(Q3) 斜线只是另一个字符，没什么特别的。

(Q2, Q4) * 字符在原始 robots.txt 规范中没有特殊含义，它只是另一个字符，如 / 和 a。一些解析器 (for example, Google’s) 使用 * 进行模式匹配。你必须检查他们的文档（每个解析器可能实现不同，因为没有关于它的规范）。

因此，遵循原始规范的解析器将在遵循 Disallow: /disallowme* 时阻止 not 阻止 http://example.com/disallowme。他们会阻止，例如：http://example.com/disallowme*foo。如上所述，您在Disallow 中指定的任何内容始终是 URL 路径prefix。

【讨论】：