Google 是否使用 BLEXBot 爬虫？ [关闭]答案

【问题标题】：Is BLEXBot crawler used by Google? [closed]Google 是否使用 BLEXBot 爬虫？ [关闭]
【发布时间】：2013-12-24 09:57:37
【问题描述】：

我已经以这种方式设置了我的 htaccess

SetEnvIfNoCase User-Agent .*google.* search_robot
SetEnvIfNoCase User-Agent .*yahoo.* search_robot
SetEnvIfNoCase User-Agent .*bot.* search_robot
SetEnvIfNoCase User-Agent .*ask.* search_robot

Order Deny,Allow
Deny from All
Allow from env=search_robot

我有这个机器人出现了：

IPv4 address:198.143.187.122
Reverse DNS:blexn3.webmeup.com
RIR:ARIN
Country:United States 
RBL Status:Clear
Thread:No threats detected

这个机器人是被 Google 使用了还是我遗漏了什么？

【问题讨论】：

这个问题似乎是题外话，因为它是关于 SEO
这个问题似乎离题了，因为它与编程无关

标签： .htaccess seo bots web-crawler

【解决方案1】：

没有 BLEXBot 不是谷歌。它属于一家名为 WebMeUp 的公司。您可以找到信息about them here。

如果您在日志中查找 IP，您会发现它不是 Google。

IP Address      198.143.187.122
Host            blexn3.webmeup.com
Location        US   US, United States
City            Chicago, IL 60661
Organization    SingleHop
ISP             SingleHop

Google IP 会将 Google 列为组织。

Google 使用他们自己的机器人，它们是定制的。您可以read up about them here，包括他们可能对您有用的用户代理字符串的最终列表。

要阻止，请关注instructions here。

【讨论】：

谢谢，斯科特。这个机器人如何通过我的 htaccess 限制？
@user278963 我添加了一个指向阻止它的说明的链接。阻止机器人的最佳方法是在您网站的根目录中使用名为 robots.txt 的文件。所有机器人都应该遵循这些规则。比通过.htaccess管理规则更简单高效
@user278963 它通过了您的限制，因为它的用户代理字符串包含bot 即BLEXBot，根据您的规则通过SetEnvIfNoCase User-Agent .*bot.* search_robot = Allow from env=search_robot
我必须使用 htaccess 文件，因为我屏蔽了除一个以外的所有国家/地区。如果我制作 robots.txt 并为机器人删除此代码“SetEnvIfNoCase”。任何机器人都看不到任何东西，因为它被 htaccess 文件禁止。
@user278963 很公平，我明白你为什么需要 .htaccess。